Signal peptide-containing proteins

ABSTRACT

The invention provides cDNAs which encodes a signal peptide-containing proteins. It also provides for the use of a cDNA, protein, and antibody in the diagnosis, prognosis, treatment and evaluation of therapies for cancer. The invention further provides vectors and host cells for the production of the protein and transgenic model systems.

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application is a Continuation of U.S. patent application Ser. No.11/979,577, filed Nov. 6, 2007, which is a Divisional of U.S. patentapplication Ser. No. 09/968,433, filed Oct. 1, 2001, now U.S. Pat. No.7,321,023, which is a Continuation-in-Part of U.S. patent applicationSer. No. 09/271,110, filed Mar. 19, 1999, now abandoned, which is aContinuation-in-Part of U.S. patent application Ser. No. 08/966,316,filed Nov. 7, 1997, now U.S. Pat. No. 5,932,445. The entire contents ofthese applications are incorporated herein by reference in theirentirety.

FIELD OF THE INVENTION

This invention relates to signal peptide-containing proteins, theirencoding cDNAs, and antibodies which binds the proteins, which can beused in the diagnosis, prognosis, treatment and evaluation of therapiesfor disorders associated with cell proliferation and cell signaling.

BACKGROUND OF THE INVENTION

Protein transport is a quintessential process for both prokaryotic andeukaryotic cells. Transport of an individual protein usually occurs viaan amino-terminal signal sequence which directs, or targets, the proteinfrom its ribosomal assembly site to a particular cellular orextracellular location. Transport may involve any combination of severalof the following steps: contact with a chaperone, unfolding, interactionwith a receptor and/or a pore complex, addition of energy, andrefolding. Moreover, an extracellular protein may be produced as aninactive precursor. Once the precursor has been exported, removal of thesignal sequence by a signal peptidase activates the protein.

Although amino-terminal signal sequences vary substantially, manypatterns and overall properties are shared. Recently, hidden Markovmodels (HMM5), statistical alternatives to FASTA and Smith Watermanalgorithms, have been used to find shared patterns, specificallyconsensus sequences (Pearson and Lipman (1988) Proc. Natl. Acad. Sci.85:2444-2448; Smith and Waterman (1981) J. Mol. Biol. 147:195-197).Although they were initially developed to examine speech recognitionpatterns, HMMs have been used in biology to analyze protein and DNAsequences and to model protein structure (Krogh et al. (1994) J. Mol.Biol. 235:1501-1531; Collin et al. (1993) Protein Sci. 2:305-314). HMMshave a formal probabilistic basis and use position-specific scores foramino acids or nucleotides and for opening and extending an insertion ordeletion. The algorithms are quite flexible in that they incorporateinformation from newly identified sequences to build even moresuccessful patterns. To find signal sequences, multiple unalignedsequences are compared to identify those which encode a peptide of 20 to50 amino acids with an N-terminal methionine.

Some examples of the protein families which are known to have signalsequences are receptors (nuclear, 4 transmembrane, G protein coupled,and tyrosine kinase), cytokines (chemokines), hormones (growth anddifferentiation factors), neuropeptides and vasomediators, proteinkinases, phosphatases, phospholipases, phosphodiesterases, nucleotidecyclases, matrix molecules (adhesion, cadherin, extracellular matrixmolecules, integrin, and selectin), G proteins, ion channels (calcium,chloride, potassium, and sodium), proteases, transporter/pumps (aminoacid, protein, sugar, metal and vitamin; calcium, phosphate, potassium,and sodium) and regulatory proteins. Receptors, kinases, and matrixproteins and diseases associated with their dysfunction are describedbelow.

G-protein coupled receptors, GPCRs are a large group of receptors whichtransduce extracellular signals. GPCRs include receptors for biogenicamines such as dopamine, epinephrine, histamine, glutamate (metabotropiceffect), acetylcholine (muscarinic effect), and serotonin; for lipidmediators of inflammation such as prostaglandins, platelet activatingfactor, and leukotrienes; for peptide hormones such as calcitonin, C5aanaphylatoxin, follicle stimulating hormone, gonadotropin releasinghormone, neurokinin, oxytocin, and thrombin; and for sensory signalmediators such as retinal photopigments and olfactory stimulatorymolecules. The structure of these highly-conserved receptors consists ofseven hydrophobic transmembrane regions, an extracellular N-terminus,and a cytoplasmic C-terminus. The N-terminus interacts with ligands, andthe C-terminus interacts with intracellular G proteins to activatesecond messengers such as cyclic AMP (cAMP), phospholipase C, inositoltriphosphate, or ion channel proteins. Three extracellular loopsalternate with three intracellular loops to link the seven transmembraneregions. The most conserved parts of these proteins are thetransmembrane regions and the first two cytoplasmic loops. A conserved,acidic-Arg-aromatic triplet present in the second cytoplasmic loop mayinteract with the G proteins. The consensus pattern,[GSTALIVMYWC]-[GSTANCPDE]-{EDPKRH}-x(2)-[LIVMNQGA]-x(2)-[LIVMFT]-[GSTANC]-[LIVMFYWSTAC]-[DENH]-R-[FYWCSH]-x(2)-[LIVM]is characteristic of most proteins belonging to this group (Bolander(1994) Molecular Endocrinology, Academic Press. San Diego Calif.;Strosberg (1991) Eur J Biochem 196: 1-10).

The kinases comprise the largest known group of proteins, a superfamilyof enzymes with widely varied functions and specificities. Kinasesregulate many different cell proliferation, differentiation, andsignaling processes by adding phosphate groups to proteins. Receptormediated extracellular events trigger the transfer of these high energyphosphate groups and activate intracellular signaling cascades.Activation is roughly analogous to the turning on a molecular switch,and in cases where signaling is uncontrolled. may be associated with orproduce inflammation and cancer.

Kinases are usually named after their substrate, their regulatorymolecule, or after some aspect of a mutant phenotype. Almost all kinasescontain a similar 250-300 amino acid catalytic domain. The N-terminaldomain, which contains subdomains I-IV, generally folds into a two-lobedstructure which binds and orients the ATP (or GTP) donor molecule. Thelarger C terminal lobe, which contains subdomains VIA-XI, binds theprotein substrate and carries out the transfer of the gamma phosphatefrom ATP to the hydroxyl group of a serine, threonine, or tyrosineresidue. Subdomain V spans the two lobes.

The kinases may be categorized into families by the different amino acidsequences (between 5 and 100 residues) located on either side of, orinserted into loops of, the kinase domain. These amino acid sequencesallow the regulation of each kinase as it recognizes and interacts withits target protein. The primary structure of the kinase domain isconserved and contains specific residues and identifiable motifs orpatterns of amino acids. The serine threonine kinases represent onefamily which preferentially phosphorylates serine or threonine residues.Many serine threonine kinases, including those from human, rabbit, rat,mouse, and chicken cells and tissues, have been described (Hardie andHanks (1995) The Protein Kinase Facts Books, Vol 1:7-20 Academic Press,San Diego, Calif.).

The matrix proteins (MPs) provide structural support. cell and tissueidentity. and autocrine, paracrine and juxtacrine properties for mosteukaryotic cells (McGowan (1992) FASEB J 6:2895-2904). MPs includeadhesion molecules, integrins and selectins, cadherins, lectins,lipocalins, and extracellular matrix proteins (ECMs). MPs possess manydifferent domains which interact with soluble, extracellular molecules.These domains include collagen-like domains. EGF-like domains.immunoglobulin-like domains, fibronectin-like domains, type A domain ofvon Willebrand factor (vWFA)-like modules. ankyrin repeat modules. RDGor RDG-like sequences. carbohydrate-binding domains, and calciumion-binding domains.

For example. multidomain or mosaic proteins play an important role inthe diverse functions of the ECMs (Engel et al. (1994) DevelopmentS35-42). ECM proteins (ECMPs) are frequently characterized by thepresence of one or more domains which may contain a number of potentialintracellular disulphide bridge motifs. For example, domains which matchthe epidermal growth factor tandem repeat consensus are present withinseveral known extracellular proteins that promote cell growth,development. and cell signaling. Other domains share internal homologyand a regular distribution of single cysteines and cysteine doublets. Inthe serum albumin family, cysteine arrangement generates thecharacteristic “double-loop” structure (Soltysik-Espanola et al. (1994)Dev Biol 165:73-85) important for ligand-binding (Kragh-Hansen (1990)Danish Med Bull 37:57-84). Other ECMPs are members of the vWFA-likemodule superfamily, a diverse group of proteins with a module sharinghigh sequence similarity. The vWFA-like module is found not only inplasma proteins, but also in plasma membrane and ECMPs (Colombatti andBonaldo (1991) Blood 77:2305-2315). Crystal structure analysis of anintegrin vWFA-like module shows a classic “Rossmann” fold and suggests ametal ion-dependent adhesion site for binding protein ligands (Lee etal. (1995) Cell 80:631-638).

The diversity, distribution and biochemistry of MPs is indicative oftheir many, overlapping roles in cell proliferation and cell signaling.MPs function in the formation, growth, remodeling, and maintenance ofbone, and in the mediation and regulation of inflammation. Biochemicalchanges that result from congenital, epigenetic, or infectious diseasesaffect the expression and balance of MPs. This balance, in turn, affectsthe activation, proliferation, differentiation, and migration ofleukocytes and determines whether the immune response is appropriate orself-destructive (Roman (1996) Immunol. Res. 15:163-178).

Adenylyl cyclases (AC) are a group of second messenger molecules whichactively participate in cell signaling processes. There are at leasteight types of mammalian ACs which show regions of conserved sequenceand are responsive to different stimuli. For example, theneural-specific type I AC is a Ca⁺⁺-stimulated enzyme whereas the humantype VII is unresponsive to CA⁺⁺ and responds to prostaglandin E1 andisoproterenol. Characterization of these ACs, their tissue distribution,and the activators and inhibitors of the different types of ACs is thesubject of various investigations (Nielsen et al. (1996) J. Biol. Chem.271:33308-16; Hellevuo et al. (1995) J. Biol. Chem. 270:11581-9). ACinteractions with kinases and G proteins in the intracellular signalingpathways of all tissues make them interesting candidate molecules forpharmaceutical research.

ATP diphosphohydrolase (ATPDase) is an enzyme expressed and secreted byquiescent endothelial cells and involved in vasomediation. Thephysiological role of ATPDase is to convert ATP and ADP to AMP. Whenthis conversion occurs in the blood vessels during inflammatoryresponse, it prevents extracellular ATP from causing vascular injury byinhibiting platelet activation and modulating vascular thrombosis(Robson et al. (1997) J. Exp. Med. 185:153-63).

The discovery of new signal peptide-containing proteins, their encodingcDNAs, and antibodies which bind the proteins satisfies a long standingneed in the art by providing molecules and compositions which can beused in the diagnosis, prognosis, treatment and evaluation of therapiesfor disorders associated with cell proliferation and cell signaling.

SUMMARY OF THE INVENTION

The present invention is based on the discovery of signalpeptide-containing proteins, their encoding cDNAs and antibodies whichspecifically binds the proteins that are useful in the diagnosis,prognosis. treatment and evaluation of therapies for disordersassociated with cell proliferation and cell signaling.

The invention provides an isolated cDNA comprising a nucleic acidmolecule selected from SEQ ID NOs: 1-15 and 17-78. SEQ ID NO: 17 encodesa protein having an amino acid sequence of SEQ ID NO: 16. The inventionalso provides isolated cDNAs comprising SEQ ID NOs: 18-78 which havefrom about 80% to about 100% sequence identity with NOs: 1-15 and 17.The invention additionally encompasses a complement of the cDNAsselected from SEQ ID NOs: 1-15 and 17-78. In one aspect, the cDNA of SEQID NO: 17 is a fragment or an oligonucleotide comprising a nucleic acidmolecule selected from A₂₄ to G₄₄, G₁₅₉ to C₁₈₂, G₅₆₁ to A₅₉₆, or A₁₀₁₁and T₁₀₄₆.

The invention provides compositions comprising the cDNAs or theircomplements and a heterologous nucleotide sequence or a labeling moietywhich may be used in methods of the invention, on a substrate, or insolution. The invention further provides a vector containing the cDNA. ahost cell containing the vector, and a method for using the cDNA to makethe human protein. The invention still further provides a transgeniccell line or organism comprising the vector containing a cDNA selectedfrom SEQ ID NOs: 1-15 and 17-78. In a second aspect. the inventionprovides a cDNA or the complement thereof which can be used in methodsof detection. screening, and purification. In a further aspect, the cDNAis a single-stranded RNA or DNA molecule, a peptide nucleic acid, abranched nucleic acid, and the like.

The invention provides a method for using a cDNA to detect differentialexpression of a nucleic acid in a sample comprising hybridizing a cDNAto the nucleic acids, thereby forming hybridization complexes andcomparing hybridization complex formation with at least one standard,wherein the comparison indicates differential expression of the cDNA inthe sample. In one aspect, the method of detection further comprisesamplifying the nucleic acids of the sample prior to hybridization. Inanother aspect, the method showing differential expression of the cDNAis used to diagnose a cancer.

The invention additionally provides a method for using a composition ofthe invention to screen a plurality of molecules or compounds toidentify or purify at least one ligand which specifically binds thecDNA, the method comprising combining the composition with the moleculesor compounds under conditions allowing specific binding, and detectingspecific binding to the composition, thereby identifying or purifying aligand which binds the composition. In one aspect. the molecules orcompounds are selected from aptamers, DNA molecules, RNA molecules,peptide nucleic acids, artificial chromosome constructions. peptides,transcription factors, repressors, and regulatory molecules.

The invention provides a purified protein or a portion thereof selectedfrom the group consisting of an amino acid sequence of SEQ ID NO: 16, avariant of SEQ ID NO: 16, an antigenic epitope of SEQ ID NO: 16, and abiologically active portion of SEQ ID NO: 16. The invention alsoprovides a composition comprising the purified protein and a labelingmoiety or a pharmaceutical carrier. The invention further provides amethod of using the protein to treat a subject with cancer comprisingadministering to a patient in need of such treatment a compositioncontaining the purified protein and a pharmaceutical carrier. Theinvention still further provides a method for using a protein to screena library or a plurality of molecules or compounds to identify or purifyat least one ligand, the method comprising combining the protein withthe molecules or compounds under conditions to allow specific bindingand detecting specific binding, thereby identifying or purifying aligand which specifically binds the protein. In one aspect. themolecules or compounds are selected from DNA molecules, RNA molecules,peptide nucleic acids, peptides, proteins, mimetics, agonists,antagonists, antibodies, immunoglobulins. inhibitors, and drugs. Inanother aspect, the ligand is used to treat a subject with a cancer.

The invention provides a method of using a protein having the amino acidsequence of SEQ ID NO: 16 to screen a plurality of antibodies toidentify an antibody which specifically binds the protein comprisingcontacting isolated antibodies with the protein under conditions to forman antibody:protein complex, and dissociating the antibody from theprotein, thereby obtaining antibody which specifically binds theprotein.

The invention also provides methods for using a protein having the aminoacid sequence of SEQ ID NO: 16 to prepare and purify polyclonal andmonoclonal antibodies which specifically bind the protein. The methodfor preparing a polyclonal antibody comprises immunizing a animal withprotein under conditions to elicit an antibody response, isolatinganimal antibodies, attaching the protein to a substrate, contacting thesubstrate with isolated antibodies under conditions to allow specificbinding to the protein, dissociating the antibodies from the protein,thereby obtaining purified polyclonal antibodies. The method forpreparing and purifying monoclonal antibodies comprises immunizing aanimal with a protein under conditions to elicit an antibody response,isolating antibody producing cells from the animal, fusing the antibodyproducing cells with immortalized cells in culture to form monoclonalantibody producing hybridoma cells, culturing the hybridoma cells, andisolating from culture monoclonal antibodies which specifically bind theprotein.

The invention provides purified polyclonal and monoclonal antibodieswhich bind specifically to a protein. The invention also provides amethod for using an antibody to detect expression of a protein in asample, the method comprising combining the antibody with a sample underconditions which allow the formation of antibody:protein complexes; anddetecting complex formation, wherein complex formation indicatesexpression of the protein in the sample. In one aspect, the amount ofcomplex formation when compared to standards is diagnostic of cancer.

The invention provides a method for inserting a heterologous marker geneinto the genomic DNA of a mammal to disrupt the expression of theendogenous polynucleotide. The invention also provides a method•forusing a cDNA to produce a model system, the method comprisingconstructing a vector containing a DNA selected from SEQ ID NOs: 1-15and 17-78 transforming the vector into an embryonic stem cell, selectinga transformed embryonic stem cell, microinjecting the transformedembryonic stem cell into a blastocyst, thereby forming a chimericblastocyst, transferring the chimeric blastocyst into a pseudopregnantdam, wherein the dam gives birth to a chimeric offspring containing thecDNA in its germ line, and breeding the chimeric mammal to produce ahomozygous, model system.

BRIEF DESCRIPTION OF THE FIGURES AND TABLE

FIGS. 1A-1E show the amino acid sequence of SP16 (SEQ ID NO: 16) andnucleic acid sequence if its encoding cDNA (SEQ ID NO: 17). Thealignment was produced using MacDNASIS PRO software (Hitachi SoftwareEngineering, South San Francisco Calif.).

FIG. 2 shows the amino acid sequence alignment between SP-16 (2547002;SEQ ID NO: 16) and bovine GPCR (GI 399711; SEQ ID NO: 79) produced usingthe MDGALIGN program of LASERGENE software (DNASTAR, Madison Wis.).

Table I shows the sequence identification numbers, reference, IncyteClone number, cDNA library, NCB1 sequence identifier and GenBankdescription for each of the signal peptide-containing proteins encodedby the cDNAs.

DESCRIPTION OF THE INVENTION

It is understood that this invention is not limited to the particularmachines, materials and methods described. It is also to be understoodthat the terminology used herein is for the purpose of describingparticular embodiments and is not intended to limit the scope of thepresent invention which will be limited only by the appended claims. Asused herein, the singular forms “a,” “an,” and “the” include pluralreference unless the context clearly dictates otherwise. For example, areference to “a host cell” includes a plurality of such host cells knownto those skilled in the art.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meanings as commonly understood by one of ordinary skillin the art to which this invention belongs. All publications mentionedherein are cited for the purpose of describing and disclosing the celllines, protocols, reagents and vectors which are reported in thepublications and which might be used in connection with the invention.Nothing herein is to be construed as an admission that the invention isnot entitled to antedate such disclosure by virtue of prior invention.

DEFINITIONS

“Array” refers to an ordered arrangement of at least two cDNAs orantibodies on a substrate. At least one of the cDNAs or antibodiesrepresents a control or standard, and the other, a cDNA or antibody ofdiagnostic or therapeutic interest. The arrangement of two to about40,000 cDNAs or of two to about 40,000 monoclonal or polyclonalantibodies on the substrate assures that the size and signal intensityof each labeled hybridization complex, formed between each cDNA and atleast one nucleic acid, or antibody:protein complex, formed between eachantibody and at least one protein to which the antibody specificallybinds, is individually distinguishable.

The “complement” of a cDNA of the Sequence Listing refers to a nucleicacid molecule which is completely complementary to the cDNA over itsfull length and which will hybridize to the cDNA or an mRNA underconditions of maximal stringency.

“cDNA” refers to an isolated polynucleotide, nucleic acid molecule, orany fragment or complement thereof. It may have originated recombinantlyor synthetically, may be double-stranded or single-stranded, representscoding and noncoding 3′ or 5′ sequence, and generally lacks introns.

A “composition” refers to the polynucleotide and a labeling moiety, apurified protein and a pharmaceutical carrier or a labeling moiety. anantibody and a labeling moiety, and the like.

“Derivative” refers to a cDNA or a protein that has been subjected to achemical modification. Derivatization of a cDNA can involve substitutionof a nontraditional base such as queosine or of an analog such ashypoxanthine. Derivatization of a protein involves the replacement of ahydrogen by an acetyl, acyl, alkyl, amino, formyl, or morpholino group.Derivative molecules retain the biological activities of the naturallyoccurring molecules but may confer advantages such as longer lifespan orenhanced activity.

“Differential expression” refers to an increased or up-regulated or adecreased or down-regulated expression as detected by presence. absenceor at least two-fold change in the amount or abundance of a transcribedmessenger RNA or translated protein in a sample.

Disorders associated with cell proliferation and cell signaling includecancers, genetic. and immune conditions. Each disorder is associatedwith expression of a signal peptide-containing protein or its specificencoding cDNA. These disorders include, but are not limited to,adenofibromatous hyperplasia as a prognostic of prostate cancer, asthma,arthritis, breast cancers such as ductal, lobular, and adenocarcinomas,Huntington's disease, mucinous cystadenoma of the ovary, renal cellcancer, schizophrenia stomach tumor, testicular seminoma, transitionalcell carcinoma of the bladder, and uterine adenosquamous carcinoma.

“Fragment” refers to a chain of consecutive nucleotides from about 50 toabout 4000 base pairs in length. Fragments may be used in PCR orhybridization technologies to identify related nucleic acid moleculesand in binding assays to screen for a ligand. Such ligands are useful astherapeutics to regulate replication, transcription or translation.

A “hybridization complex” is formed between a cDNA and a nucleic acid ofa sample when the purines of one molecule hydrogen bond with thepyrimidines of the complementary molecule, e.g., 5′-A-G-T-C-3′ basepairs with 3′-T-C-A-G-5′. Hybridization conditions, degree ofcomplementarity and the use of nucleotide analogs affect the efficiencyand stringency of hybridization reactions.

“Labeling moiety” refers to any visible or radioactive label than can beattached to or incorporated into a cDNA or protein. Visible labelsinclude but are not limited to anthocyanins, green fluorescent protein(OFP), 6 glucuronidase. luciferase, Cy3 and Cy5, and the like.Radioactive markers include radioactive forms of hydrogen, iodine,phosphorous, sulfur, and the like.

“Ligand” refers to any agent, molecule, or compound which will bindspecifically to a polynucleotide or to an epitope of a protein. Suchligands stabilize or modulate the activity of polynucleotides orproteins and may be composed of inorganic and/or organic substancesincluding minerals, cofactors, nucleic acids, proteins, carbohydrates,fats. and lipids.

“Oligonucleotide” refers a single-stranded molecule from about 18 toabout 60 nucleotides in length which may be used in hybridization oramplification technologies or in regulation of replication,transcription or translation. Equivalent terms are amplimer, primer, andoligomer.

An “oligopeptide” is an amino acid sequence from about five residues toabout 15 residues that is used as part of a fusion protein to produce anantibody.

“Portion” refers to any part of a protein used for any purpose; butespecially, to an epitope for the screening of ligands or for theproduction of antibodies.

“Post-translational modification” of a protein can involve lipidation,glycosylation, phosphorylation, acetylation, racemization, proteolyticcleavage, and the like. These processes may occur synthetically orbiochemically. Biochemical modifications will vary by cellular location,cell type, pH, enzymatic milieu, and the like.

“Probe” refers to a cDNA that hybridizes to at least one nucleic acid ina sample. Where targets are single-stranded, probes are complementarysingle strands. Probes can be labeled with reporter molecules for use inhybridization reactions including Southern, northern, in situ, dot blot,array, and like technologies or in screening assays.

“Protein” refers to a polypeptide or any portion thereof. A “portion” ofa protein refers to that length of amino acid sequence which wouldretain at least one biological activity, a domain identified by PFAM orPRINTS analysis or an antigenic epitope of the protein identified usingKyte-Doolittle algorithms of the PROTEAN program (DNASTAR).

“Purified” refers to any molecule or compound that is separated from itsnatural environment and is from about 60% free to about 90% free fromother components with which it is naturally associated.

“Sample” is used in its broadest sense as containing nucleic acids,proteins. antibodies. and the like. A sample may comprise a bodilyfluid; the soluble fraction of a cell preparation, or an aliquot ofmedia in which cells were grown; a chromosome, an organelle, or membraneisolated or extracted from a cell; genomic DNA, RNA, or cDNA in solutionor bound to a substrate; a biopsy, a cell; a tissue; a tissue print; afingerprint, buccal cells, skin, or hair; and the like.

“Similarity” refers to the quantification (usually percentage) ofnucleotide or residue matches between at least two sequences alignedusing a standard algorithm such as Smith-Waterman alignment (Smith andWaterman (1981) J Mol Bioi 147:195-197) or BLAST2 (Altschul et al.(1997) Nucleic Acids Res 25:3389-3402). BLAST2 may be used in areproducible way to insert gaps in one of the sequences in order tooptimize alignment and to achieve a more meaningful comparison betweenthem. Particularly in proteins, similarity is greater than identity inthat conservative substitutions (for example, valine for leucine orisoleucine) are counted in calculating the reported percentage.Substitutions which are considered to be conservative are well known inthe art.

“Specific binding” refers to a special and precise interaction betweentwo molecules which is dependent upon their structure. particularlytheir molecular side groups. For example, the intercalation of aregulatory protein into the major groove of a DNA molecule or thebinding between an epitope of a protein and an agonist, antagonist, orantibody.

“Substrate” refers to any rigid or semi-rigid support to which cDNAs orproteins are bound and includes membranes, filters, chips, slides,wafers, fibers. magnetic or nonmagnetic beads, gels, capillaries orother tubing, plates, polymers, and microparticles with a variety ofsurface forms including wells, trenches, pins, channels and pores.

A “transcript image” is a profile of gene transcription activity in aparticular tissue at a particular time.

“Variant” refers to molecules that are recognized variations of a cDNAor a protein encoded by the cDNA. Splice variants may be determined byBLAST score, wherein the score is at least 100, and most preferably atleast 400. Allelic variants have a high percent identity to the cDNAsand may differ by about three bases per hundred bases. “Singlenucleotide polymorphism” (SNP) refers to a change in a single base as aresult of a substitution, insertion or deletion. The change may beconservative (purine for purine) or non-conservative (purine topyrimidine) and may or may not result in a change in an encoded aminoacid or its secondary, tertiary, or quaternary structure.

The Invention

The invention is based on the discovery of signal peptide-containingproteins, individually SP-1 through SP-16, and their encoding orregulating cDNAs, SEQ ID NOs: 1-15 and 17 which are characterized inTABLE I. U.S. Ser. No. 09/271,110, filed 17 Mar. 1999, is incorporatedby reference herein in its entirety. The cDNAs and fragments thereof,the proteins and portions thereof, and an antibody which specificallybinds each protein can be used directly or as compositions for thediagnosis, prognosis, treatment and evaluation of therapies fordisorders associated with cell proliferation and cell signaling.

SP-1 was identified in Incyte Clone 1221102 from the NEUTGMTOI cDNAlibrary using a computer search for amino acid sequence alignments. AcDNA comprising the nucleic acid shown in SEQ ID NO: 1 and derived usingIncyte Clone 1221102, which encompasses nucleotides 300-514 also foundin Incyte clone 5269342F6 (SEQ ID NO:18) which was used onHumanGenomeGEM1 microarray, encodes a GPCR with homology to g1575512,the GPR19 gene. Electronic northern analysis showed the expression ofthis sequence in neuronal tissues and in stimulated granulocytes. Thetranscript image found in EXAMPLE VIII supported the northern analysisand showed four-fold, up-regulated expression of the cDNA encoding SP-1in the brain from a subject diagnosed with Huntington's disease.

SP-2 was identified in Incyte Clone 1457779 from the COLNFET02 cDNAlibrary using a computer search for amino acid sequence alignments. AcDNA comprising the nucleic acid shown in SEQ ID NO: 2 and derived fromIncyte Clone 1457779, which encompasses nucleotides 1-466 also found inIncyte clone 1457779F6 (SEQ ID NO: 22) which was used on LifeGEM1microarray, encodes an ATP diphosphohydrolase with homology to g1842120.Electronic northern analysis showed the expression of this sequence infetal colon, and transcript imaging showed that differential expressionof SP-2 is diagnostic of stomach tumor.

SP-3 was identified in Incyte Clone 1682433 from the PROSNOT15 cDNAlibrary using a computer search for amino acid sequence alignments. AcDNA comprising the nucleic acid shown in SEQ ID NO: 3 and derived fromIncyte Clone 1682433, which encompasses nucleotides 1-481 also found inIncyte clone 2444114F6 (SEQ ID NO: 26) which was used on LifeGEM1microarray, encodes a signal peptide-containing protein with homology tog1010391, a transmembrane protein. Electronic northern analysis showedthe expression of this sequence in fetal, cancerous or inflamed cellsand tissues. Transcript imaging showed that differential expression ofSP-3 is diagnostic of ductal carcinoma of the breast.

SP-4 was identified in Incyte Clone 1899132 from the BLADTUT06 cDNAlibrary using a 35 (;omputer search for amino acid sequence alignments.A nucleotide sequence, SEQ ID NO: 4. derived from Incyte Clone 1899132,which encompasses nucleotides 272-625 also found in Incyte clone1899132F6 (SEQ ID NO: 31) which was used on LifeGEM1 microarray encodesa signal peptide containing protein with homology to g887602, aSaccharomyces cerevisiae protein. Electronic northern analysis showedthe expression of this sequence in cancerous and inflamed cells andtissues; transcript imaging showed that differential expression of SP-4is diagnostic of uterine adenosquamous carcinoma.

SP-5 was identified in Incyte Clone 1907344 from the CONNTUT01 cDNAlibrary using a computer search for amino acid sequence alignments. Anucleotide sequence, SEQ ID NO: 5, derived from Incyte Clone 1907344,which encompasses nucleotides 17-450 also found in Incyte clone2487075F6 (SEQ ID NO: 35) which was used on HumanGenomeGEM1 microarray,encodes a signal peptide containing protein with homology to g33715,immunoglobulin light chain. Electronic northern analysis showed theexpression of this sequence in cancerous and fetal or infant cells andtissues; transcript imaging showed that differential expression of SP-5is diagnostic for adenocarcinoma of the breast.

SP-6 was identified in Incyte Clone 1963651 from the BRSTNOT04 cDNAlibrary using a computer search for amino acid sequence alignments. Anucleotide sequence, SEQ ID NO: 6, derived from Incyte Clone 1963651,which encompasses nucleotides 651-1090 also found in Incyte clone1414964F6 (SEQ ID NO: 41) which was used on LifeGEM1 microarray, encodesa GPCR with homology to g1657623, orphan receptor RDC1. Althoughelectronic northern analysis showed expression in ductal carcinoma;transcript imaging showed that differential expression of SP6 in ovaryis diagnostic for mucinous cystadenoma.

SP-7 was identified in Incyte Clone 1976095 from the PANCTUT02 cDNAlibrary using a computer search for amino acid sequence alignments. Anucleotide sequence, SEQ ID NO: 7, derived from Incyte Clone 1976095,which encompasses nucleotides 74-525 also found in Incyte clone1976095F6 (SEQ ID NO: 44) which was used on LifeGEM1 microarray, encodesa signal peptide-containing protein with homology to g2117185, aMycobacterium tuberculosis protein. Electronic northern analysis showedthe expression of this sequence in cancerous and inflamed tissues;transcript imaging showed that' differential expression of SP-7 insynovium or cartilage is diagnostic for arthritis.

SP-8 was identified in Incyte Clone 2417676 from the HNT3AZT01 cDNAlibrary using a computer search for amino acid sequence alignments. Anucleotide sequence, SEQ ID NO: 8, derived from Incyte Clone 2417676,which encompasses nucleotides 2-363 also found in Incyte clone 2890678F6(SEQ ID NO: 49) which was used on HumanGenomeGEM1 microarray, encodes asignal peptide-containing protein with homology to g2150012, a humantransmembrane protein. Electronic northern analysis showed this sequenceto be expressed in proliferating. cancerous or inflamed tissues;transcript imaging shows that differential expression of SP-8 isdiagnostic for testicular seminoma.

SP-9 was identified in Incyte Clone 1805538 from the SINTNOT13 cDNAlibrary using a computer search for amino acid sequence alignments. Anucleotide sequence, SEQ ID NO: 9, derived from Incyte Clone 1805538,which encompasses nucleotides 15-419 also found in Incyte clone2183094F6 (SEQ ID NO: 53) which was used on LifeGEM1 microarray, encodesa signal peptide-containing protein with homology to g294502, anextracellular matrix protein. Electronic northern analysis showed thissequence to be expressed in inflamed tissues; transcript imaging showedthat differential expression of SP-9 is diagnostic of adenofibromatoushyperplasia and prognostic for prostate cancer.

SP-10 was identified in Incyte Clone 1869688 from the SKINBIT01 cDNAlibrary using a computer search for amino acid sequence alignments. Anucleotide sequence, SEQ ID NO: 10, derived from Incyte Clone 1869688,which encompasses nucleotides 1124-1380 also found in Incyte clone2182042F6 (SEQ ID NO: 57) which was used on HumanGenomeGEM1 microarray,encodes a signal peptide-containing protein with homology to g1562, a G3serine/threonine kinase. Electronic northern analysis showed thissequence to be expressed in proliferating tissues; transcript imagingshowed that differential expression of SP-10 is diagnostic oftransitional cell carcinoma of the bladder.

SP-11 was identified in Incyte Clone 1880692 from the LEUKNOT03 cDNAlibrary using a computer search for amino acid sequence alignments. Anucleotide sequence, SEQ ID NO: 11, derived from Incyte Clone 1880692,which encompasses nucleotides 12-309 also found in Incyte clone1880692F6 (SEQ ID NO: 60) which was used on LifeGEMI microarray, encodesa signal peptide-containing protein with homology to g1487910, a C.elegans protein. Electronic northern analysis showed this sequence to beexpressed in cancer and blood cells; transcript imaging showed thatdifferential expression of SP-11 is diagnostic for renal cell cancer.

SP-12 was identified in Incyte Clone 318060 from the EOSIHET02 cDNAlibrary using a computer search for amino acid sequence alignments. Anucleotide sequence, SEQ ID NO: 12, derived from Incyte Clone 318060,which encompasses nucleotides 193-1244 also found in Incyte clone1266985F6 (SEQ ID NO: 64) which was used on HumanGenomeGEM1 microarray,encodes a receptor with homology to g606788, an opioid GPCR. Althoughelectronic northern analysis showed this sequence to be expressed innerve and blood cells; transcript imaging showed that differentialexpression of SP-12 is diagnostic for adenocarcinoma of the breast.

SP-13 was identified in Incyte Clone 396450 from the PITUNOT02 cDNAlibrary using a computer search for amino acid sequence alignments. Anucleotide sequence, SEQ ID NO: 13, derived from Incyte Clone 396450,which encompasses nucleotides 1-277 also found in Incyte clone 396450R6(SEQ ID NO: 65) which was used on LifeGEM1 microarray, encodes a signalpeptide-containing protein with homology to g342279, opiomelanocortin.Electronic northern analysis showed this sequence to be expressed inhormone producing cells and tissues and inflamed cells and tissues;transcript imaging showed that differential expression of SP-13 isdiagnostic for schizophrenia.

SP-14 was identified in Incyte Clone 506333 from the TMLR3DT02 cDNAlibrary using a computer search for amino acid sequence alignments. Anucleotide sequence. SEQ ID NO: 14, derived from Incyte Clone 506333,which encompasses nucleotides 1-514 also found in Incyte clone 506333T6(SEQ ID NO: 68) which was used on LifeGEM1 microarray, encodes a signalpeptide-containing protein with homology to g2204110, adenylyl cyclase.Electronic northern analysis showed this sequence to be expressed incancerous and inflamed cells and tissues; transcript imaging showed thatdifferential expression of SP-14 is diagnostic of breast cancer, inparticular lobular carcinoma of the breast.

SP-15 was identified in Incyte Clone 764465 from the LUNGNOT04 cDNAlibrary using a computer search for amino acid sequence alignments. Anucleotide sequence, SEQ ID NO: 15, derived from Incyte Clone 764465,which encompasses nucleotides 49-528 also found in Incyte clone 764465R6(SEQ ID NO: 69) which was used on LifeGEM1 microarray, encodes areceptor with homology to GI 1902984, lectin-like oxidized LDL receptor.Electronic northern analysis showed this sequence to be expressed inlung and in fetal liver; transcript imaging confirms the northernanalysis and shows that differential expression of SP-15 when used withlung samples is diagnostic for asthma.

SP-16 (SEQ ID NO: 16) was identified in Incyte Clone 2547002 from theUTRSNOT11 cDNA library using a computer search for amino acid sequencealignments. A consensus sequence, SEQ ID NO: 17, was derived from theextension and assembly of the overlapping nucleic acid sequences ofIncyte Clones 2741185T6, 2741185T6F6.comp, and 2741185H1 (BRSTIUT14) and2547002F6 and 2547002H1 (UTRSNOT11), SEQ ID NOs: 72-76, respectively.

In one embodiment, the invention encompasses a polypeptide comprisingthe amino acid sequence of SEQ ID NO: 16, as shown in FIGS. 1A-1E, SP-16is 350 amino acids in length and has a G protein coupled receptorsignature at S₁₂₅GMQFLACISIDRYVAV; three potential N-glycosylation sitesat N₆, N₁₉, and N₂₇₆; a potential glycosaminoglycan attachment site atS₁₄₈; and ten potential phosphorylation sites at S₂₅, T₇₄, T₁₇₇, S₁₉₅,T₂₂₃, Y₂₆₉, S₂₇₈, S₃₀₉, S₃₂₃, and S₃₃₀. SP-16 has 86% sequence identitywith a bovine GPCR (g399711) and shares the GPCR signature, theN-glycosylation, the glycosaminoglycan attachment site, and the firstnine of the phosphorylation sites with the bovine receptor (FIG. 2).Fragments of the nucleic acid molecule useful for designingoligonucleotides or to be used directly as hybridization probes todistinguish between these homologous molecules include A₂₄ to G₄₄, G₁₅₉to C₁₈₂, G₅₆₁ to A₅₉₆, or A₁₀₁₁ to T₁₀₄₆. mRNA encoding SP-16 wassparsely expressed in cDNA libraries. Electronic northern analysis(EXAMPLE VIII below) showed expression in breast adenocarcinoma;transcript imaging confirmed the northern analysis and showed that SP-16is differentially expressed in breast adenocarcinoma and not in matchedor normal breast tissues.

cDNA fragments encoding or regulating signal peptide-containing proteinswere identified using BLAST2 with default parameters and the ZOOSEQdatabases (Incyte Genomics, Palo Alto Calif.). These cDNAs have fromabout 80% to about 95% sequence identity to the human cDNA as shown inthe table below. The first column shows the SEQ ID_(H) for the humancDNA; the second column, the SEQ ID_(FR) for fragment cDNAs; the thirdcolumn, the sequence numbers for the fragments; the fourth column, thespecies; the fifth column, percent identity to the human cDNA; and thesixth column, the nucleotide, alignment (Nt_(H)) of the human andfragment cDNAs.

SEQ ID_(H) SEQ ID_(FR) Clone No. Species Identity Nt_(H) Alignment 1 19051293_Mm.1 Mouse 80%  1-518 1 20 703901370J1 Rat 84%  1-518 1 21296771_Rn.1 Rat 81%  1-518 2 23 023793_Mm.1 Mouse 83% 307-606 2 24701923941H1 Rat 84% 402-606 2 25 317489_Rn.1 Rat 84% 402-606 3 27703711491J1 Dog 89%  817-1075 3 28 060931_Mm.3 Mouse 85%  95-1099 3 29701926832H1 Rat 88%  801-1033 3 30 317017_Rn.1 Rat 88%  801-1033 4 32026438_Mm.1 Mouse 84% 311-861 4 33 70298994H1 Rat 86% 489-731 4 34286037_Rn.1 Rat 86% 341-731 5 36 703200737J1 Monkey 90% 280-450 5 37071816_Mf.2 Monkey 86% 280-450 5 38 008837_Cf.1 Dog 89%  38-361 5 39700298833H1 Rat 92% 263-450 5 40 274060_Rn.1 Rat 92% 263-450 6 42031166_Mm.1 Mouse 87%  201-1803 6 43 203462_Rn.3 Rat 87%  776-1261 7 45005653_Mf.1 Monkey 90% 519-700 7 46 007876_Cf.1 Dog 89% 134-700 7 47003508_Mm.1 Mouse 83%  98-668 7 48 205363_Rn.4 Rat 84%  74-700 8 50008780_Cf.1 Dog 93% 186-296 8 51 013606_Mm.1 Mouse 86%  37-357 8 52248462_Rn.1 Rat 89% 110-313 9 54 001680_Cf.1 Dog 85% 148-201 9 55021581_Mm.1 Mouse 82% 232-532 9 56 283960_Rn.1 Rat 86% 232-307 10 58037196_Mm.1 Mouse 90%  192-1040 10 59 215631_Rn.1 Rat 88% 170-651 11 61023463_Cf.1 Dog 90%  93-363 11 62 017863_Mm.1 Mouse 85% 179-619 11 63300968_Rn.1 Rat 82% 179-647 13 66 019409_Mm.2 Mouse 83% 136-272 13 67216194_Rn.7 Rat 84% 134-272 15 70 028681_Mm.2 Mouse 80%  54-215 15 71211274_Rn.1 Rat 88%  56-114 17 77 000569_Mm.1 Mouse 87%  789-1091 17 78251020_Rn.1 Rat 83% 180-820

It will be appreciated by those skilled in the art that as a result ofthe degeneracy of the genetic code, a multitude of cDNAs encoding signalpeptide-containing proteins, some bearing minimal similarity to thecDNAs of any known and naturally occurring gene, may be produced. Thus,the invention contemplates each and every possible variation of cDNAthat could be made by selecting combinations based on possible codonchoices. These combinations are made in accordance with the standardtriplet genetic code as applied to the polynucleotide encoding naturallyoccurring signal peptide-containing proteins, and all such variationsare to be considered as being specifically disclosed.

The cDNAs of SEQ ID NOs: 1-15 and 17-78 may be used in hybridization,amplification, and screening technologies to identify and distinguishamong the identical and related molecules in a sample. The cDNAs mayalso be used to produce transgenic cell lines or organisms which aremodel systems for cancers and upon which the toxicity and efficacy ofpotential therapeutic treatments may be tested. Toxicology studies,clinical trials, and subject/patient treatment profiles may be performedand monitored using the cDNAs, proteins, antibodies and molecules andcompounds identified using the cDNAs and proteins of the presentinvention.

Characterization and Use of the Invention

cDNA Libraries

In a particular embodiment disclosed herein, mRNA is isolated from cellsand tissues using methods which are well known to those skilled in theart and used to prepare the cDNA libraries. The Incyte cDNAs wereisolated from cDNA libraries prepared as described in the EXAMPLES. Theconsensus sequences are chemically and/or electronically assembled fromfragments including Incyte cDNAs and extension and/or shotgun sequencesusing computer programs such as PHRAP (P Green, University ofWashington, Seattle Wash.), and the AUTOASSEMBLER application (ABI).After verification of the 5′ and 3′ sequence, at least one of therepresentative cDNAs which encode a signal peptide-containing protein isdesignated a reagent. These reagent cDNAs are also used in theconstruction of human microarrays and are represented among thesequences on the Human Genome Gem Arrays (Incyte Genomics).

Sequencing

Methods for sequencing nucleic acids are well known in the art and maybe used to practice any of the embodiments of the invention. Thesemethods employ enzymes such as the Klenow fragment of DNA polymerase I,SEQUENASE, Taq DNA polymerase and thermostable T7 DNA polymerase(Amersham Pharmacia Biotech (APB), Piscataway N.J.), or combinations ofpolymerases and proofreading exonucleases such as those found in theELONGASE amplification system (Life Technologies, Gaithersburg Md.).Preferably, sequence preparation is automated with machines such as theMICROLAB 2200 system (Hamilton. Reno Nev.) and the DNA ENGINE thermalcycler (MJ Research, Watertown Mass.). Machines commonly used forsequencing include the ABI PRISM 3700, 377 or 373 DNA sequencing systems(ABI), the MEGABACE 1000 DNA sequencing system (APB), and the like. Thesequences may be analyzed using a variety of algorithms well known inthe art and described in Ausubel et al. (1997; Short Protocols inMolecular Biology, John Wiley & Sons, New York N.Y., unit 7.7) and inMeyers (1995; Molecular Biology and Biotechnology, Wiley VCH, New YorkN.Y., pp. 856-853).

Shotgun sequencing may also be used to complete the sequence of aparticular cloned insert of interest. Shotgun strategy involves randomlybreaking the original insert into segments of various sizes and cloningthese fragments into vectors. The fragments are sequenced andreassembled using overlapping ends until the entire sequence of theoriginal insert is known. Shotgun sequencing methods are well known inthe art and use thermostable DNA polymerases, heat-labile DNApolymerases, and primers chosen from representative regions flanking thecDNAs of interest. Incomplete assembled sequences are inspected foridentity using various algorithms or programs such as CONSED (Gordon(1998) Genome Res 8; 195-202) which are well known in the art.Contaminating sequences, including vector or chimeric sequences, ordeleted sequences can be removed or restored, respectively, organizingthe incomplete assembled sequences into finished sequences.

Extension of a Nuclic Acid Molecule

The sequences of the invention may be extended using various PCR-basedmethods known in the art. For example, the XL-PCR kit (ABI), nestedprimers, and commercially available cDNA or genomic DNA libraries may beused to extend the . . . For all PCR-based methods, primers may bedesigned using commercially available software to be about 22 to 30nucleotides in length, to have a GC content of about 50% or more, and toanneal to a target molecule at temperatures from about 55 C to about 68C: When extending a sequence to recover regulatory elements, it ispreferable to use genomic, rather than cDNA libraries.

Hybridization

The cDNA and fragments thereof can be used in hybridization technologiesfor various purposes. A probe may be designed or derived from uniqueregions such as the 5′ regulatory region or from a nonconserved region(i.e., 5′ or 3′ of the nucleotides encoding the conserved catalyticdomain of the protein) and used in protocols to identify naturallyoccurring molecules encoding a signal peptide-containing protein,allelic variants, or related molecules. The probe may be DNA or RNA, maybe single-stranded, and should have at least 50% sequence identity to anucleic acid molecule selected from SEQ ID NOs: 1-15 and 17-78.Hybridization probes may be produced using oligolabeling, nicktranslation, end-labeling, or PCR amplification in the presence of areporter molecule. A vector containing the cDNA or a fragment thereofmay be used to produce an mRNA probe in vitro by addition of an RNApolymerase and labeled nucleotides. These procedures may be conductedusing commercially available kits.

The stringency of hybridization is determined by G+C content of theprobe. salt concentration. and temperature. In particular, stringencycan be increased by reducing the concentration of salt or raising thehybridization temperature. Hybridization can be performed at lowstringency with buffers, such as 5×SSC with 1% sodium dodecyl sulfate(SDS) at 60 C, which permits the formation of a hybridization complexbetween s that contain some mismatches. Subsequent washes are performedat higher stringency with buffers such as 0.2×SSC with 0.1% SDS ateither 45 C (medium stringency) or 68 C (high stringency). At highstringency, hybridization complexes will remain stable only where thenucleic acids are completely complementary. In some membrane-basedhybridizations, preferably 35% or most preferably 50%, formamide can beadded to the hybridization solution to reduce the temperature at whichhybridization is performed, and background signals can be reduced by theuse of detergents such as Sarkosyl or TRITON X-100 (Sigma-Aldrich, StLouis Mo.) and a blocking agent such as denatured salmon sperm DNA.Selection of components and conditions for hybridization are well knownto those skilled in the art and are reviewed in Ausubel (supra) andSambrook et al. (1989) Molecular Cloning, A Laboratory Manual, ColdSpring Harbor Press, Plainview N.Y.

Arrays incorporating cDNAs or antibodies may be prepared and analyzedusing methods well known in the art. Oligonucleotides or cDNAs may beused as hybridization probes or targets to monitor the expression levelof large numbers of genes simultaneously or to identify geneticvariants, mutations, and single nucleotide polymorphisms. Monoclonal orpolyclonal antibodies may be used to detect or quantify expression of aprotein in a sample. Such arrays may be used to determine gene function;to understand the genetic basis of a condition, disease, or disorder; todiagnose a condition, disease, or disorder; and to develop and monitorthe activities of therapeutic agents. (See, e.g., Brennan et al. (1995)U.S. Pat. No. 5,474,796; Schena et al. (1996) Proc Natl Acad Sci93:10614-10619; Heller et al. (1997) Proc Natl Acad Sci 94:2150-2155;Heller et al. (1997) U.S. Pat. No. 5,605,662; and deWildt et al. (2000)Nature Biotechnol 18:989-994.)

Hybridization probes are also useful in mapping the naturally occurringgenomic sequence. The probes may be hybridized to a particularchromosome, a specific region of a chromosome, or an artificialchromosome construction. Such constructions include human artificialchromosomes (HAC), yeast artificial chromosomes (YAC), bacterialartificial chromosomes (BAC), bacterial PI constructions, or the cDNAsof libraries made from single chromosomes.

Expression

Anyone of a multitude of cDNAs encoding a signal peptide-containingprotein may be cloned into a vector and used to express the protein, orportions thereof, in host cells. The can be engineered by such methodsas DNA shuffling, as described in U.S. Pat. No. 5,830,721, andsite-directed mutagenesis to create new restriction sites, alterglycosylation patterns, change codon preference to increase expressionin a particular host, produce splice variants, extend half-life. and thelike. The expression vector may contain transcriptional andtranslational control elements (promoters, enhancers. specificinitiation signals. and polyadenylated 3′ sequence) from various sourceswhich have been selected for their efficiency in a particular host. Thevector, cDNA, and regulatory elements are combined using in vitrorecombinant DNA techniques, synthetic techniques, and/or in vivo geneticrecombination techniques well known in the art and described in Sambrook(supra, ch. 4, 8, 16 and 17).

A variety of host systems may be transformed with an expression vector.These include, but are not limited to, bacteria transformed withrecombinant bacteriophage, plasmid, or cosmid DNA expression vectors;yeast transformed with yeast expression vectors; insect cell systemstransformed with baculovirus expression vectors; plant cell systemstransformed with expression vectors containing viral and/or bacterialelements, or animal cell systems (Ausubel supra, unit 16). For example.an adenovirus transcription/translation complex may be utilized inmammalian cells. After sequences are ligated into the E1 or E3 region ofthe viral genome, the infective virus is used to transform and expressthe protein in host cells. The Rous sarcoma virus enhancer or SV40 orEBV-based vectors may also be used for high-level protein expression.

Routine cloning, subcloning, and propagation of s can be achieved usingthe multifunctional PBLUESCRIPT vector (Stratagene, La Jolla Calif.) orPSPORT1 plasmid (Life Technologies). Introduction of a into the multiplecloning site of these vectors disrupts the lacZ gene and allowscolorimetric screening for transformed bacteria. In addition. thesevectors may be useful for in vitro transcription, dideoxy sequencing,single strand rescue with helper phage, and creation of nested deletionsin the cloned sequence.

For long term production of recombinant proteins. the vector can bestably transformed into cell lines along with a selectable or visiblemarker gene on the same or on a separate vector. After transformation,cells are allowed to grow for about 1 to 2 days in enrichedmedia-and-then are-transferred to selective media. Selectable markers,antimetabolite, antibiotic, or herbicide resistance genes, conferresistance to the relevant selective agent and allow growth and recoveryof cells which successfully express the introduced sequences. Resistantclones identified either by survival on selective media or by theexpression of visible markers may be propagated using culturetechniques. Visible markers are also used to estimate the amount ofprotein expressed by the introduced genes. Verification that the hostcell contains the desired cDNA is based on DNA-DNA or DNA-RNAhybridizations or PCR amplification techniques.

The host cell may be chosen for its ability to modify a recombinantprotein in a desired fashion. Such modifications include acetylation,carboxylation, glycosylation, phosphorylation, lipidation, acylation andthe like. Post-translational processing which cleaves a “prepro” formmay also be used to specify protein targeting, folding, and/or activity.Different host cells available from the ATCC (Manassas Va.) which havespecific cellular machinery and characteristic mechanisms forpost-translational activities may be chosen to ensure the correctmodification and processing of the recombinant protein.

Recovery of Proteins from Cell Culture

Heterologous moieties engineered into a vector for ease of purificationinclude glutathione Stransferase (GST), 6×His, FLAG, MYC, and the like.GST and 6×His are purified using commercially available affinitymatrices such as immobilized glutathione and metal-chelate resins,respectively. FLAG and MYC are purified using commercially availablemonoclonal and polyclonal antibodies. For ease of separation followingpurification, a sequence encoding a proteolytic cleavage site may bepart of the vector located between the protein and the heterologousmoiety. Methods for recombinant protein expression and purification arediscussed in Ausubel (supra, unit 16) and are commercially available.

Chemical Synthesis of Peptides

Proteins or portions thereof may be produced not only by recombinantmethods, but also by using chemical methods well known in the art. Solidphase peptide synthesis may be carried out in a batchwise or continuousflow process which sequentially adds α-amino-and side chain-protectedamino acid residues to an insoluble polymeric support via a linkergroup. A linker group such as methylarnine-derivatized polyethyleneglycol is attached to poly(styrene-co-divinylbenzene) to form thesupport resin. The amino acid residues are N-α-protected by acid labileBoc (t-butyloxycarbonyl) or base-labile Fmoc(9-fluorenylmethoxycarbonyl). The carboxyl group of the protected aminoacid is coupled to the amine of the linker group to anchor the residueto the solid phase support resin. Trifluoroacetic acid or piperidine areused to remove the protecting group in the case of Boc or Fmoc,respectively. Each additional amino acid is added to the anchoredresidue using a coupling agent or pre-activated amino acid derivative.and the resin is washed. The full length peptide is synthesized bysequential deprotection, coupling of derivitized amino acids, andwashing with dichloromethane and/or N,N-dimethylformamide. The peptideis cleaved between the peptide carboxy terminus and the linker group toyield a peptide acid or amide. These processes are described in theNovabiochem 1997/98 Catalog and Peptide Synthesis Handbook (San DiegoCalif., pp. S1-S20). Automated synthesis may also be carried out onmachines such as the ABI 431 A peptide synthesizer (ABI). A protein orportion thereof may be purified by preparative high performance liquidchromatography and its composition confirmed by amino acid analysis orby sequencing (Creighton (1984) Proteins, Structures and MolecularProperties, WH Freeman, New York N.Y.).

Preparation and Screening of Antibodies

Various hosts including, but not limited to, goats, rabbits, rats, mice,and human cell lines may be immunized by injection with a signalpeptide-containing protein or any immunogenic portion thereof. Adjuvantssuch as Freund's, mineral gels, and surface active substances such aslysolecithin, pluronic polyols, polyanions, peptides, oil emulsions,keyhole limpet hemacyanin (KLH), and dinitrophenol may be used toincrease immunological response. The oligopeptide, peptide, or portionof protein used to induce antibodies should consist of at least aboutfive amino acids, more preferably ten amino acids, which are identicalto a portion of the natural protein. Oligopeptides may be fused withproteins such as KLH in order to produce antibodies to the chimericmolecule.

Monoclonal antibodies may be prepared using any technique which providesfor the production of antibodies by continuous cell lines in culture.These include, but are not limited to, the hybridoma technique, thehuman β-cell hybridoma technique, and the EBV-hybridoma technique. (See,e.g., Kohler et al. (1975) Nature 256:495-497; Kozbor et al. (1985) J.Immunol Methods 81:31-42; Cote et al. (1983) Proc Natl Acad Sci80:2026-2030; and Cole et al (1984) Mol Cell Biol 62:109-120.)

Alternatively, techniques described for antibody production may beadapted, using methods known in the art, to produce epitope-specific,single chain antibodies. Antibody fragments which contain specificbinding sites for epitopes of the protein may also be generated. Forexample, such fragments include, but are not limited to, F(ab′)2fragments produced by pepsin digestion of the antibody molecule and Fabfragments generated by reducing the disulfide bridges of the F(ab′)2fragments. Alternatively, Fab expression libraries may be constructed toallow rapid and easy identification of monoclonal Flib fragments withthe desired specificity. (See, e.g., Huse et al. (1989) Science246:1275-1281.)

A signal peptide-containing protein, or a portion thereof, may be usedin screening assays of phagemid or β-lymphocyte immunoglobulin librariesto identify antibodies having the desired specificity. Numerousprotocols for competitive binding or immunoassays using eitherpolyclonal or monoclonal antibodies with established specificities arewell known in the art. Such immunoassays typically involve themeasurement of complex formation between the protein and its specificantibody. A two-site, monoclonal-based immunoassay utilizing monoclonalantibodies reactive to two non-interfering epitopes is preferred, but acompetitive binding assay may also be employed (Pound (1998)Immunochemical Protocols, Humana Press, Totowa N.J.).

Labeling of Molecules for Assay

A wide variety of reporter molecules and conjugation techniques areknown by those skilled in the art and may be used in various nucleicacid, amino acid, and antibody assays. Synthesis of labeled moleculesmay be achieved using commercially available kits (Promega, MadisonWis.) for incorporation of a labeled nucleotide such as ³²P-dCTP (APB),Cy3-dCTP or Cy5-dCTP (Operon Technologies, Alameda Calif.), or aminoacid such as ³⁵S-methionine (APB). Nucleotides and amino acids may bedirectly labeled with a variety of substances including fluorescent,chemiluminescent, or chromogenic agents, and the like, by chemicalconjugation to amines, thiols and other groups present in the moleculesusing reagents such as BIODIPY or FITC (Molecular Probes, Eugene Oreg.).

Diagnostics Nucleic Acid Assays

The cDNAs, fragments, oligonucleotides, complementary RNA and DNAmolecules, and PNAs may be used to detect and quantify differential geneexpression for diagnostic purposes. Disorders associated with expressionof SP-1 through SP-16 include, but are not limited to, adenofibromatoushyperplasia as a prognostic of prostate cancer, asthma, arthritis,breast cancers such as ductal, lobular. and adeno-carcinomas,Huntington's disease, mucinous cystadenoma of the ovary, renal cellcancer, schizophrenia stomach tumor, testicular seminoma, transitionalcell carcinoma of the bladder, and uterine adenosquamous carcinoma. Thediagnostic assay may use hybridization or quantitative PCR to comparegene expression in a biological or biopsied subject sample to standardsamples in order to detect differential gene expression. Qualitative andquantitative methods for this comparison are commercially available andwell known in the art.

For example, the cDNA or probe may be labeled by standard methods andadded to a biological sample from a subject under conditions for theformation of hybridization complexes. After an incubation period. thesample is washed and the amount of label (or signal) associated withhybridization complexes, is quantified and compared with a standardvalue. If complex formation in the subject sample is significantlyaltered (higher or lower) in comparison to either, a normal or diseasestandard, then differential expression indicates the presence of adisorder.

In order to provide standards for establishing differential expression,normal and disease expression profiles are established. This isaccomplished by combining a sample taken from normal subjects, eitheranimal or human, with a cDNA under conditions for hybridization tooccur. Standard hybridization complexes may be quantified by comparingthe values obtained using normal subjects with values from an experimentin which a known amount of a purified sequence is used. Standard valuesobtained in this manner may be compared with values obtained fromsamples from patients who were diagnosed with a particular condition,disease, or disorder. Deviation from standard values toward thoseassociated with a particular disorder is used to diagnose that disorder.

Such assays may also be used to evaluate the efficacy of a particulartherapeutic treatment regimen in animal studies or in clinical trials orto monitor the treatment of an individual patient. Once the presence ofa condition is established and a treatment protocol is initiated,diagnostic assays may be repeated on a regular basis to determine if thelevel of expression in the patient begins to approximate that which isobserved in a normal subject. The results obtained from successiveassays may be used to show the efficacy of treatment over a periodranging from several days to years.

Protein Assays

Detection and quantification of a protein using either labeled aminoacids or specific polyclonal or monoclonal antibodies which specificallybind the protein are known in the art. Examples of such techniquesinclude two-dimensional polyacrylamide gel electrophoresis,enzyme-linked immunosorbent assays (ELISAs), radioimmunoassays (RIAs),and fluorescence activated cell sorting (FACS). These assays and theirquantitation against purified. labeled standards are well known in theart (Ausubel, Supra, unit 10.1-10.6). A two-site, monoclonal-basedimmunoassay utilizing monoclonal antibodies reactive to twonon-interfering epitopes is preferred, but a competitive binding assaymay be employed. (See. e.g. Coligan et al. (1997) Current Protocols inImmunology, Wiley-Interscience, New York N.Y.; and Pound. supra.)

Therapeutics

Chemical and structural similarity, in particular annotation and motifsthat suggest function. are described for SEQ ID NO:16 in THE INVENTIONsection and transcript images that suggest function for the proteinsencoded or regulated by SEQ ID NO:1-15 are described in EXAMPLE VIII andEXAMPLE IX. In addition. the differential expression of each of thecDNAs was shown to be tissue-specific and associated with a particulardisorder in EXAMPLE VIII. Thus, each protein clearly plays a role in atleast one of the described disorders (adenofibromatous hyperplasia as aprognostic of prostate cancer, asthma, arthritis, breast cancers such asductal, lobular, and adena-carcinomas, Huntington's disease, mucinouscystadenoma of the ovary, renal cell cancer, schizophrenia stomachtumor, testicular seminoma, transitional cell carcinoma of the bladder,and uterine adenosquamous carcinoma) and SP-1 through SP-16 may be usedeither directly as a therapeutic or as a target for drug discovery.

In one embodiment, increased expression of the protein may be treated bythe delivery of an inhibitor, antagonist, antibody and the like or apharmaceutical composition containing one or more of these molecules.Such delivery may be effected by methods well known in the art and mayinclude delivery by an antibody specifically targeted to the diseasedtissue.

In another embodiment, decreased expression of the protein late in thedisease process may be treated by the delivery of the protein, anagonist, enhancer and the like or a pharmaceutical compositioncontaining one or more of these molecules. Such delivery may be effectedby methods well known in the art and may include delivery by an antibodyspecifically targeted to the diseased tissue.

Any of these compositions may be administered in combination with othertherapeutic agents. Selection of the agents for use in combinationtherapy may be made by one of ordinary skill in the art according toconventional pharmaceutical principles. A combination of therapeuticagents may act synergistically to affect treatment of a particularcancer at a lower dosage of each agent alone.

Modification of Gene Expression Using Nucleic Acids

Gene expression may be modified by designing complementary or antisensemolecules (DNA, RNA, or PNA) to the control, 5′, 3′, or other regulatoryregions of the gene encoding a signal peptidecontaining protein.Oligonucleotides designed to inhibit transcription initiation arepreferred. Similarly, inhibition can be achieved using triple helixbase-pairing which inhibits the binding of polymerases, transcriptionfactors, or regulatory molecules (Gee et al. In: Huber and Carr (1994)Molecular and Immunologic Approaches, Futura Publishing, Mt. Kisco N.Y.,pp. 163-177). A complementary molecule may also be designed to blocktranslation by preventing binding between ribosomes and mRNA. In onealternative. a library or plurality of cDNAs may be screened to identifythose which specifically bind a regulatory, nontranslated sequence.

Ribozymes, enzymatic RNA molecules, may also be used to catalyze thespecific cleavage of RNA. The mechanism of ribozyme action involvessequence-specific hybridization of the ribozyme molecule tocomplementary target RNA followed by endonucleolytic cleavage at sitessuch as GUA, GUU, and GUC. Once such sites are identified, anoligonucleotide with the same sequence may be evaluated for secondarystructural features which would render the oligonucleotide inoperable.The suitability of candidate wgets may also be evaluated by testingtheir hybridization with complementary oligonucleotides usingribonuclease protection assays.

Complementary nucleic acids and ribozymes of the invention may beprepared via recombinant expression, in vitro or in vivo, or using solidphase phosphoramidite chemical synthesis. In addition, RNA molecules maybe modified to increase intracellular stability and half-life byaddition of flanking sequences at the 5′ and/or 3′ ends of the moleculeor by the use of phosphorothioate or 2′ O-methyl rather thanphosphodiesterase linkages within the backbone of the molecule.Modification is inherent in the production of PNAs and can be extendedto other nucleic acid molecules. Either the inclusion of nontraditionalbases such as inosine, queosine, and wybutosine, or the modification ofadenine, cytidine, guanine, thymine, and uridine with acetyl-, methyl-,thio-groups renders the molecule less available to endogenousendonucleases.

Screening and Purification Assays

A cDNA encoding a signal peptide-containing protein may be used toscreen a library or a plurality of molecules or compounds for specificbinding affinity. The libraries may be aptamers, DNA molecules, RNAmolecules, PNAs, peptides, proteins such as transcription factors,enhancers, or repressors, and other ligands which regulate the activity,replication, transcription, or translation of the endogenous gene. Theassay involves combining a polynucleotide with a library or plurality ofmolecules or compounds under conditions allowing specific binding, anddetecting specific binding to identify at least one molecule whichspecifically binds the single-stranded or double-stranded molecule.

In one embodiment, the cDNA of the invention may be incubated with aplurality of purified molecules or compounds and binding activitydetermined by methods well known in the art, e.g., a gelretardationassay (U.S. Pat. No. 6,010,849) or a commercially available reticulocytelysate transcriptional assay. In another embodiment, the cDNA may beincubated with nuclear extracts from biopsied and/or cultured cells andtissues. Specific binding between the cDNA and a molecule or compound inthe nuclear extract is initially determined by gel shift assay and maybe later confirmed by recovering and raising antibodies against thatmolecule or compound. When these antibodies are added into the assay,they cause a supershift in the gel-retardation assay.

In another embodiment, the cDNA may be used to purify a molecule orcompound using affinity chromatography methods well known in the art. Inone embodiment, the cDNA is chemically reacted with cyanogen bromidegroups on a polymeric resin or gel. Then a sample is passed over andreacts with or binds to the cDNA. The molecule or compound which isbound to the cDNA may be released from the cDNA by increasing the saltconcentration of the flow-through medium and collected.

In a further embodiment, the protein or a portion thereof may be used topurify a ligand from a sample. A method for using a protein or a portionthereof to purify a ligand would involve combining the protein or aportion thereof with a sample under conditions to allow specificbinding, detecting specific binding between the protein and ligand,recovering the bound protein, and using a chaotropic agent to separatethe protein from the purified ligand.

In a preferred embodiment, a signal peptide-containing protein may beused to screen a plurality of molecules or compounds in any of a varietyof screening assays. The portion of the protein employed in suchscreening may be free in solution, affixed to an abiotic or bioticsubstrate (e.g. borne on a cell surface), or located intracellularly.For example, in one method, viable or fixed prokaryotic host cells thatare stably transformed with recombinant nucleic acids that haveexpressed and positioned a peptide on their cell surface can be used inscreening assays. The cells are screened against a plurality orlibraries of ligands, and the specificity of binding or formation ofcomplexes between the expressed protein and the ligand can be measured.Depending on the particular kind of molecules or compounds beingscreened, the assay may be used to identify DNA molecules, RNAmolecules, peptide nucleic acids, peptides, proteins, mimetics,agonists, antagonists, antibodies, immunoglobulins, inhibitors, anddrugs or any other ligand, which specifically binds the protein.

In one aspect, this invention comtemplates a method for high throughputscreening using very small assay volumes and very small amounts of testcompound as described in U.S. Pat. No. 5,876,946, incorporated herein byreference. This method is used to screen large numbers of molecules andcompounds via specific binding. In another aspect, this invention alsocontemplates the use of competitive drug screening assays in whichneutralizing antibodies capable of binding the protein specificallycompete with a test compound capable of binding to the protein.Molecules or compounds identified by screening may be used in a modelsystem to evaluate their toxicity, diagnostic, or therapeutic potential.

Pharmacology

Pharmaceutical compositions contain active ingredients in an effectiveamount to achieve a desired and intended purpose and a pharmaceuticalcarrier. The determination of an effective dose is well within thecapability of those skilled in the art. For any compound. thetherapeutically effective dose may be estimated initially either in cellculture assays or in animal models. The animal model is also used toachieve a desirable concentration range and route of administration.Such information may then be used to determine useful doses and routesfor administration in humans.

A therapeutically effective dose refers to that amount of protein orinhibitor which ameliorates the symptoms or condition. Therapeuticefficacy and toxicity of such agents may be determined by standardpharmaceutical procedures in cell cultures or experimental animals,e.g., ED₅₀ (the dose therapeutically effective in 50% of the population)and LD₅₀ (the dose lethal to 50% of the population). The dose ratiobetween toxic and therapeutic effects is the therapeutic index, and itmay be expressed as the ratio, LD₅₀/ED₅₀. Pharmaceutical compositionswhich exhibit large therapeutic indexes are preferred. The data obtainedfrom cell culture assays and animal studies are used in formulating arange of dosage for human use.

Model Systems

Animal models may be used as bioassays where they exhibit a phenotypicresponse similar to that of humans and where exposure conditions arerelevant to human exposures. Mammals are the most common models. andmost infectious agent, cancer, drug, and toxicity studies are performedon rodents such as rats or mice because of low cost, availability,lifespan, reproductive potential, and abundant reference literature.Inbred and outbred rodent strains provide a convenient model forinvestigation of the physiological consequences of under- orover-expression of genes of interest and for the development of methodsfor diagnosis and treatment of diseases. A mammal inbred to over-expressa particular gene (for example, secreted in milk) may also serve as aconvenient source of the protein expressed by that gene.

Toxicology

Toxicology is the study of the effects of agents on living systems. Themajority of toxicity studies are performed on rats or mice. Observationof qualitative and quantitative changes in physiology, behavior,homeostatic processes, and lethality in the rats or mice are used togenerate a toxicity profile and to assess potential consequences onhuman health following exposure to the agent.

Genetic toxicology identifies and analyzes the effect of an agent on therate of endogenous, spontaneous, and induced genetic mutations.Genotoxic agents usually have common chemical or physical propertiesthat facilitate interaction with nucleic acids and are most harmful whenchromosomal aberrations are transmitted to progeny. Toxicologicalstudies may identify agents that increase the frequency of structural orfunctional abnormalities in the tissues of the progeny if administeredto either parent before conception, to the mother during pregnancy. orto the developing organism. Mice and rats are most frequently used inthese tests because their short reproductive cycle allows the productionof the numbers of organisms needed to satisfy statistical requirements.

Acute toxicity tests are based on a single administration of an agent tothe subject to determine the symptomology or lethality of the agent.Three experiments are conducted: 1) an initial dose-range findingexperiment, 2) an experiment to narrow the range of effective doses, and3) a final experiment for establishing the dose-response curve.

Subchronic toxicity tests are based on the repeated administration of anagent. Rat and dog are commonly used in these studies to provide datafrom species in different families. With the exception ofcarcinogenesis, there is considerable evidence that daily administrationof an agent at high-dose concentrations for periods of three to fourmonths will reveal most forms of toxicity in adult animals.

Chronic toxicity tests, with a duration of a year or more, are used todemonstrate either the absence of toxicity or the carcinogenic potentialof an agent. When studies are conducted on rats, a minimum of three testgroups plus one control group are used, and animals are examined andmonitored at the outset and at intervals throughout the experiment.

Transgenic Animal Models

Transgenic rodents that over-express or under-express a gene of interestmay be inbred and used to model human diseases or to test therapeutic ortoxic agents. (See, e.g., U.S. Pat. No. 5,175,383 and U.S. Pat. No.5,767,337.) In some cases, the introduced gene may be activated at aspecific time in a specific tissue type during fetal or postnataldevelopment. Expression of the transgene is monitored by analysis ofphenotype, of tissue-specific mRNA expression, or of serum and tissueprotein levels in transgenic animals before, during, and after challengewith experimental drug therapies.

Embryonic Stem Cells

Embryonic (ES) stem cells isolated from rodent embryos retain thepotential to form embryonic tissues. When ES cells are placed inside acarrier embryo, they resume normal development and contribute to tissuesof the live-born animal. ES cells are the preferred cells used in thecreation of experimental knockout and knockin rodent strains. Mouse EScells, such as the mouse 129/SvJ cell line, are derived from the earlymouse embryo and are grown under culture conditions well known in theart. Vectors used to produce a transgenic strain contain a disease genecandidate and a marker gen, the latter serves to identify the presenceof the introduced disease gene. The vector is transformed into ES cellsby methods well known in the art, and transformed ES cells areidentified and microinjected into mouse cell blastocysts such as thosefrom the C57BU6 mouse strain. The blastocysts are surgically transferredto pseudopregnant dams, and the resulting chimeric progeny are genotypedand bred to produce heterozygous or homozygous strains.

ES cells derived from human blastocysts may be manipulated in vitro todifferentiate into at least eight separate cell lineages. These lineagesare used to study the differentiation of various cell types and tissuesin vitro, and they include endoderm, mesoderm, and ectodermal cell typeswhich differentiate into, for example, neural cells, hematopoieticlineages, and cardiomyocytes.

Knockout Analysis

In gene knockout analysis, a region of a gene is enzymatically modifiedto include a nonmammalian gene such as the neomycin phosphotransferasegene (neo; Capecchi (1989) Science 244:1288-1292). The modified gene istransformed into cultured ES cells and integrates into the endogenousgenome by homologous recombination. The inserted sequence disruptstranscription and translation of the endogenous gene. Transformed cellsare injected into rodent blastulae, and the blastulae are implanted intopseudopregnant dams. Transgenic progeny are crossbred to obtainhomozygous inbred lines which lack a functional copy of the mammaliangene. In one example, the mammalian gene is a human gene.

Knockin Analysis

ES cells can be used to create knockin humanized animals (pigs) ortransgenic animal models (mice or rats) of human diseases. With knockintechnology. a region of a human gene is injected into animal ES cells.and the human sequence integrates into the animal cell genome.Transformed cells are injected into blastulae and the blastulae areimplanted as described above. Transgenic progeny or inbred lines arestudied and treated with potential pharmaceutical agents to obtaininformation on treatment of the analogous human condition. These methodshave been used to model several human diseases.

Non-Human Primate Model

The field of animal testing deals with data and methodology from basicsciences such as physiology, genetics, chemistry, pharmacology andstatistics. These data are paramount in evaluating the effects oftherapeutic agents on non-human primates as they can be related to humanhealth. Monkeys are used as human surrogates in vaccine and drugevaluations, and their responses are relevant to human exposures undersimilar conditions. Cynomolgus and Rhesus monkeys (Macaca fascicularisand Macaca mulatta, respectively) and Common Marmosets (Callithrixjacchus) are the most common non-human primates (NHPs) used in theseinvestigations. Since great cost is associated with developing andmaintaining a colony of NHPs. early research and toxicological studiesare usually carried out in rodent models. In studies using behavioralmeasures such as drug addiction, NHPs are the first choice test animal.In addition, NHPs and individual humans exhibit differentialsensitivities to many drugs and toxins and can be classified as a rangeof phenotypes from “extensive metabolizers” to “poor metabolizers” ofthese agents.

In additional embodiments, the cDNAs which encode the protein may beused in any molecular biology techniques that have yet to be developed.provided the new techniques rely on properties of cDNAs that arecurrently known. including, but not limited to, such properties as thetriplet genetic code and specific base pair interactions.

EXAMPLES I cDNA Library Construction

The UTRSNOT11 cDNA library was constructed from microscopically normaluterine tissue obtained from a 43-year-old female during a vaginalhysterectomy following diagnosis of uterine leiomyoma. Pathologyindicated that the myometrium contained an intramural leiomyoma and asubmucosal leiomyoma. The endometrium was proliferative, however, thecervix and fallopian tubes were unremarkable. The right and left ovariescontained corpus lutea. The patient presented with metrorrhagia anddeficiency anemia. Patient history included benign hypertension andatherosclerosis. Medications included PROVERA tablets (Pharmacia,Peapack N.J.), iron, and vitamins. Family history included benignhypertension, atherosclerosis, and malignant colon neoplasms.

The frozen tissue was homogenized and lysed in TRIZOL reagent (1 gmtissue/10 ml reagent; Life Technologies) using a POLYTRON homogenizer(PT-3000; Brinkmann Instruments, Westbury N.Y.). After a briefincubation on ice, chloroform was added (1:5 v/v), and the lysate wascentrifuged. The upper chloroform layer was removed to a fresh tube, andthe RNA was extracted with isopropanol, resuspended in DEPC-treatedwater, and treated with DNAse for 25 min at 37 C. The RNA wasre-extracted three times with acid phenol-chloroform, pH 4.7, andprecipitated with 0.3M sodium acetate and 2.5 volumes ethanol. The mRNAwas isolated with the OLIGOTEX kit (Qiagen, Chatsworth Calif.) and usedto construct the cDNA library.

The mRNA was handled according to the recommended protocols in theSUPERSCRIPT plasmid system (Life Technologies). The cDNAs werefractionated on a SEPHAROSE CL4B column (APB), and those cDNAs exceeding400 bp were ligated into pINCY1 plasmid. The plasmid was subsequentlytransformed into DH5a competent cells (Life Technologies).

II Isolation of cDNA Clones

Plasmid DNA was released from the cells and purified using the REAL PREP96 plasmid kit (Qiagen). This kit enabled the simultaneous purificationof 96 samples in a 96-well block using multichannel reagent dispensers.The recommended protocol was employed except for the followingchanges: 1) the bacteria were cultured in 1 ml of sterile TERRIFIC BROTH(BD Biosciences, San Jose Calif.) with carbenicillin at 25 mg/L andglycerol at 0.4%; 2) after incubation for 19 hours, the cultures werelysed with 0.3 ml of lysis buffer; and 3) following isopropanolprecipitation, the plasmid DNA pellet was resuspended in 0.1 ml ofdistilled water. After the last step in the protocol, samples weretransferred to a 96-well block for storage at 4 C.

III Sequencing

The cDNAs were prepared for sequencing using the MICROLAB 2200 system(Hamilton) in combination with the DNA ENGINE thermal cyclers (MJResearch). The cDNAs were sequenced by the method of Sanger and Coulson(1975; J Mol Biol 94:441-448) using an ABI PRISM 373 or 377 sequencingsystem (ABI). Most of the isolates were sequenced according to standardABI protocols and kits with solution volumes of 0.25×-1.0×concentrations or using standard solutions and dyes from APB.

IV Extension of cDNA Sequences

The cDNA sequence may be extended to full length using the Incyte clone,for example, SEQ ID NO:17, 2547002H1. A set of nested deletionsequencing templates was prepared from overnight liquid culture of clone496071 using the ERASE-A-BASE system (Promega).

Sequencing reactions were performed with the ABI PRISM Dye Terminatorcycle sequencing kit with AMPLITAQ FS DNA polymerase (ABI). PCR wasperformed on a DNA ENGINE thermal cycler (MI Research). Reactions wereanalyzed on an ABI PRISM 310 genetic analyzer (ABI). Individualsequences were assembled and edited using ABI AutoAssembler software(ABI).

In the alternative, extension is accomplished using oligonucleotideprimers synthesized to initiate 5′ and 3′ extension of the knownfragment. These primers are designed using commercially available primeranalysis software to be about 22 to 30 nucleotides in length, to have aGC content of about 50% or more, and to anneal to the target sequence attemperatures of about 68 C to about 72 C. Any stretch of nucleotidesthat would result in hairpin structures and primer-primer dimerizationsis avoided.

Selected cDNA libraries are used as templates to extend the sequence. Ifmore than one extension is necessary, additional or nested sets ofprimers are designed. Preferred libraries have been size-selected toinclude larger cDNAs and random primed to contain more sequences with 5′or upstream regions of genes. Genomic libraries are used to obtainregulatory elements, especially extension into the 5′ promoter bindingregion.

High fidelity amplification is obtained by PCR using methods such asthat taught in U.S. Pat. No. 5,932,451. PCR is performed in 96-wellplates using the DNA ENGINE thermal cycler (MJ Research). The reactionmix contains DNA template, 200 nmol of each primer, reaction buffercontaining Mg²⁺, (NH₄,)₂SO₄, and β-mercaptoethanol, Taq DNA polymerase(APB), ELONGASE enzyme (Life Technologies), and Pfu DNA polymerase(Stratagene), with the following parameters for primer pair PCI A andPCI B (Incyte Genomics): Step 1: 94 C, three min; Step 2: 94 C, 15 sec;Step 3: 60 C, one min; Step 4: 68 C, two min; Step 5: Steps 2, 3, and 4repeated 20 times; Step 6: 68 C, five min; Step 7: storage at 4 C. Inthe alternative, the parameters for primer pair T7 and SK+ (Stratagene)are as follows: Step 1: 94 C, three min; Step 2: 94 C, 15 sec; Step 3:57 C, one min; Step 4: 68 C, two min; Step 5: Steps 2, 3, and 4 repeated20 times; Step 6: 68 C, five min; Step 7: storage at 4 C.

The concentration of DNA in each well is determined by dispensing 100 μlPICOGREEN quantitation reagent (0.25% reagent in 1×TE, v/v; MolecularProbes) and 0.5 μl of undiluted PCR product into each well of an opaquefluorimeter plate (Corning, Acton Mass.) and allowing the DNA to bind tothe reagent. The plate is scanned in a Fluoroskan II (Labsystems Oy,Finland) to measure the fluorescence of the sample and to quantify theconcentration of DNA. A 5 μl to 10 μl aliquot of the reaction mixture isanalyzed by electrophoresis on a 1% agarose minigel to determine whichreactions are successful in extending the sequence.

The extended clones are desalted, concentrated, transferred to 384-wellplates, digested with CviJI cholera virus endonuclease (MolecularBiology Research, Madison Wis.), and sonicated or sheared prior toreligation into pUC18 vector (APB). For shotgun sequences, the digestednucleotide sequences are separated on low concentration (0.6 to 0.8%)agarose gels, fragments are excised, and the agar is digested withAGARACE enzyme (Promega). Extended clones are religated using T4 DNAligase (New England Biolabs) into pUC18 vector (APB), treated with PfuDNA polymerase (Stratagene) to fill-in restriction site overhangs, andtransfected into E. coli competent cells. Transformed cells are selectedon antibiotic-containing media, and individual colonies are picked andcultured overnight at 37 C in 384-well plates in LB/2× carbenicillinliquid media.

The cells are lysed, and DNA is amplified using primers, Taq DNApolymerase (APB) and Pfu DNA polymerase (Stratagene) with the followingparameters: Step 1: 94 C, three min; Step 2: 94 C, 15 sec; Step 3: 60 C,one min; Step 4: 72 C, two min; Step 5: steps 2, 3, and 4 repeated 29times; Step 6: 72 C, five min; Step 7: storage at 4 C. DNA is quantifiedusing PICOGREEN quantitation reagent (Molecular Probes) as describedabove. Samples with low DNA recoveries are reamplified using theconditions described above. Samples are diluted with 20%dimethylsulfoxide (DMSO; 1:2, v/v), and sequenced using DYENAMIC energytransfer sequencing primers and the DYENAMIC DIRECT cycle sequencing kit(APB) or the PRISM BIGDYE terminator cycle sequencing kit (ABI).

V Homology Searching of cDNA Clones and their Deduced Proteins

The cDNAs of the Sequence Listing or their deduced amino acid sequenceswere used to query databases such as GenBank, SwissProt, BLOCKS, and thelike. These databases that contain previously identified and annotatedsequences or domains were searched using BLAST or BLAST2 to producealignments and to determine which sequences were exact matches orhomologs. The alignments were to sequences of prokaryotic (bacterial) oreukaryotic (animal, fungal, or plant) origin. Alternatively, algorithmssuch as the one described in Smith and Smith (1992, Protein Engineering5:35-51) could have been used to deal with primary sequence patterns andsecondary structure gap penalties. All of the sequences disclosed inthis application have lengths of at least 49 nucleotides, and no morethan 12% uncalled bases (where N is recorded rather than A, C, G, or T).

As detailed in Karlin (supra), BLAST matches between a query sequenceand a database sequence were evaluated statistically and only reportedwhen they satisfied the threshold of 10⁻²⁵ for nucleotides and 10⁻¹⁴ forpeptides. Homology was also evaluated by product score calculated asfollows: the % nucleotide or amino acid identity [between the query andreference sequences] in BLAST is multiplied by the % maximum possibleBLAST score [based on the lengths of query and reference sequences] andthen divided by 100. In comparison with hybridization procedures used inthe laboratory, the stringency for an exact match was set from a lowerlimit of about 40 (with 1-2% error due to uncalled bases) to a 100%match of about 70.

The BLAST software suite (available on the National Center forBiotechnology Information (“NCBI”) website, Bethesda Md., includesvarious sequence analysis programs including “blastn” that is used toalign nucleotide sequences and BLAST2 that is used for direct pairwisecomparison of either nucleotide or amino acid sequences. BLAST programsare commonly used with gap and other parameters set to default settings,e.g.: Matrix: BLOSUM62; Reward for match: 1; Penalty for mismatch: −2;Open Gap: 5 and Extension Gap: 2 penalties; Gap x drop-off: 50; Expect:10; Word Size: 11; and Filter: on. Identity is measured over the entirelength of a sequence. Brenner et al., (1998; Proc Natl Acad Sci95:6073-6078, incorporated herein by reference) analyzed BLAST for itsability to identify structural homologs by sequence identity and found30% identity is a reliable threshold for sequence alignments of at least150 residues and 40%, for alignments of at least 70 residues.

The cDNAs of this application were compared with assembled consensussequences or templates found in the LIFESEQ GOLD database (IncyteGenomics). Component sequences from cDNA, extension, full length, andshotgun sequencing projects were subjected to PHRED analysis andassigned a quality score. All sequences with an acceptable quality scorewere subjected to various pre-processing and editing pathways to removelow quality 3′ ends, vector and linker sequences, polyA tails, Alurepeats, mitochondria) and ribosomal sequences, and bacterialcontamination sequences. Edited sequences had to be at least 50 bp inlength, and low-information sequences and repetitive elements such asdinucleotide repeats, Alu repeats, and the like, were replaced by “Ns”or masked.

Edited sequences were subjected to assembly procedures in which thesequences were assigned to gene bins. Each sequence could only belong toone bin, and sequences in each bin were assembled to produce a template.Newly sequenced components were added to existing bins using BLAST andCROSSMATCH. To be added to a bin, the component sequences had to have aBLAST quality score greater than or equal to 150 and an alignment of atleast 82% local identity. The sequences in each bin were assembled usingPHRAP. Bins with several overlapping component sequences were assembledusing DEEP PHRAP. The orientation of each template was determined basedon the number and orientation of its component sequences.

Bins were compared to one another, and those having local similarity ofat least 82% were combined and reassembled. Bins having templates withless than 95% local identity were split. Templates were subjected toanalysis by STITCHER/EXON MAPPER algorithms that determine theprobabilities of the presence of splice variants, alternatively splicedexons, splice junctions, differential expression of alternative splicedgenes across tissue types or disease states, and the like. Assemblyprocedures were repeated periodically, and templates were annotatedusing BLAST against GenBank databases such as GBpri. An exact match wasdefined as having from 95% local identity over 200 base pairs through100% local identity over 100 base pairs and a homolog match as having anE-value (or probability score) of <1×10⁻⁸. The templates were alsosubjected to frameshift FASTx against GENPEPT, and homolog match wasdefined as having an E-value of <1×10⁻⁸. Template analysis and assemblywas described in U.S. Ser. No. 09/276,534, filed Mar. 25, 1999.

Following assembly, templates were subjected to BLAST, motif, and otherfunctional analyses and categorized in protein hierarchies using methodsdescribed in U.S. Ser. No. 08/812,290 and U.S. Ser. No. 08/811,758, bothfiled Mar. 6, 1997; in U.S. Ser. No. 08/947,845, filed Oct. 9, 1997; andin U.S. Ser. No. 09/034,807, filed Mar. 4, 1998. Then templates wereanalyzed by translating each template in all three forward readingframes and searching each translation against the PFAM database ofhidden Markov model-based protein families and domains using the HMMERsoftware package (Washington University School of Medicine, St. LouisMo.; http://pfam.wustl.edu/). The cDNA was further analyzed usingMACDNASIS PRO software (Hitachi Software Engineering), and LASERGENEsoftware (DNASTAR) and queried against public databases such as theGenBank rodent, mammalian, vertebrate, prokaryote, and eukaryotedatabases, SwissProt, BLOCKS, PRINTS, PFAM, and Prosite.

VI Chromosome Mapping

Radiation hybrid and genetic mapping data available from publicresources such as the Stanford Human Genome Center (SHGC), WhiteheadInstitute for Genome Research (WIGR), and Généthon are used to determineif any of the cDNAs presented in the Sequence Listing have been mapped.Any fragment of a cDNA encoding a signal peptide-containing protein thathas been mapped result in the assignment of all related fragments andregulatory sequences to the same location. The genetic map locations aredescribed as ranges, or intervals, of human chromosomes. The mapposition of an interval, in cM (which is roughly equivalent to 1megabase of human DNA), is measured relative to the terminus of thechromosomal p-arm.

VII Hybridization Technologies and Analyses

Immobilization of cDNAs on a Substrate

The cDNAs are applied to a substrate by one of the following methods. Amixture of cDNAs is fractionated by gel electrophoresis and transferredto a nylon membrane by capillary transfer. Alternatively, the cDNAs areindividually ligated to a vector and inserted into bacterial host cellsto form a library. The cDNAs are then arranged on a substrate by one ofthe following methods. In the first method, bacterial cells containingindividual clones are robotically picked and arranged on a nylonmembrane. The membrane is placed on LB agar containing selective agent(carbenicillin, kanamycin, ampicillin, or chloramphenicol depending onthe vector used) and incubated at 37 C for 16 hr. The membrane isremoved from the agar and consecutively placed colony side up in 10%SDS, denaturing solution (1.5 M NaCl, 0.5 M NaOH), neutralizing solution(1.5 M NaCl, 1 M Tris, pH 8.0), and twice in 2×SSC for 10 min each. Themembrane is then UV irradiated in a STRATALINKER UV-crosslinker(Stratagene).

In the second method, cDNAs are amplified from bacterial vectors bythirty cycles of PCR using primers complementary to vector sequencesflanking the insert. PCR amplification increases a startingconcentration of 1-2 ng nucleic acid to a final quantity greater than 5μg. Amplified nucleic acids from about 400 bp to about 5000 bp in lengthare purified using SEPHACRYL-400 beads (APB). Purified nucleic acids arearranged on a nylon membrane manually or using a dot/slot blottingmanifold and suction device and are immobilized by denaturation,neutralization, and UV irradiation as described above. Purified nucleicacids are robotically arranged and immobilized on polymer-coated glassslides using the procedure described in U.S. Pat. No. 5,807,522.Polymer-coated slides are prepared by cleaning glass microscope slides(Corning, Acton Mass.) by ultrasound in 0.1% SDS and acetone, etching in4% hydrofluoric acid (VWR Scientific Products, West Chester Pa.),coating with 0.05% aminopropyl silane (Sigma Aldrich) in 95% ethanol,and curing in a 110 C oven. The slides are washed extensively withdistilled water between and after treatments. The nucleic acids arearranged on the slide and then immobilized by exposing the array to UVirradiation using a STRATALINKER UV-crosslinker (Stratagene). Arrays arethen washed at room temperature in 0.2% SDS and rinsed three times indistilled water. Non-specific binding sites are blocked by incubation ofarrays in 0.2% casein in phosphate buffered saline (PBS; Tropix, BedfordMass.) for 30 min at 60 C; then the arrays are washed in 0.2% SDS andrinsed in distilled water as before.

Probe Preparation for Membrane Hybridization

Hybridization probes derived from the cDNAs of the Sequence Listing areemployed for screening cDNAs, mRNAs, or genomic DNA in membrane-basedhybridizations. Probes are prepared by diluting the cDNAs to aconcentration of 40-50 ng in 45 μl TE buffer, denaturing by heating to100 C for five min, and briefly centrifuging. The denatured cDNA is thenadded to a REDIPRIME tube (APB), gently mixed until blue color is evenlydistributed, and briefly centrifuged. Five μl of [³²P]dCTP is added tothe tube, and the contents are incubated at 37 C for 10 min. Thelabeling reaction is stopped by adding 5 μl of 0.2M EDTA, and probe ispurified from unincorporated nucleotides using a PROBEQUANT G-50microcolumn (APB). The purified probe is heated to 10° C. for five min,snap cooled for two min on ice, and used in membrane-basedhybridizations as described below.

Probe Preparation for Polymer Coated Slide Hybridization

Hybridization probes derived from mRNA isolated from samples areemployed for screening cDNAs of the Sequence Listing in array-basedhybridizations. Probe is prepared using the GEMbright kit (IncyteGenomics) by diluting mRNA to a concentration of 200 ng in 9 μl TEbuffer and adding 5 μl 5× buffer, 1 μl 0.1 M DTT, 3 μl Cy3 or Cy5labeling mix, 1 μl RNase inhibitor, 1 μl reverse transcriptase, and 5 μl1× yeast control mRNAs. Yeast control mRNAs are synthesized by in vitrotranscription from noncoding yeast genomic DNA (W. Lei, unpublished). Asquantitative controls, one set of control mRNAs at 0.002 ng, 0.02 ng.0.2 ng, and 2 ng are diluted into reverse transcription reaction mixtureat ratios of 1:100,000, 1:10,000, 1:1000, and 1:100 (w/w) to sample mRNArespectively. To examine mRNA differential expression patterns, a secondset of control mRNAs are diluted into reverse transcription reactionmixture at ratios of 1:3, 3:1, 1:10, 10:1, 1:25, and 25:1 (w/w). Thereaction mixture is mixed and incubated at 37 C for two hr. The reactionmixture is then incubated for 20 min at 85 C, and probes are purifiedusing two successive CHROMA SPIN+TE 30 columns (Clontech, Palo AltoCalif.). Purified probe is ethanol precipitated by diluting probe to 90μl in DEPC-treated water, adding 2 μl 1 mg/ml glycogen, 60 μl 5 M sodiumacetate, and 300 μl 100% ethanol. The probe is centrifuged for 20 min at20,800×g, and the pellet is resuspended in 12 μl resuspension buffer,heated to 65 C for five min, and mixed thoroughly. The probe is heatedand mixed as before and then stored on ice. Probe is used in highdensity array-based hybridizations as described below.

Membrane-Based Hybridization

Membranes are pre-hybridized in hybridization solution containing 1%Sarkosyl and 1× high phosphate buffer (0.5 M NaCl, 0.1 M Na2HPO4, 5 mMEDTA, pH 7) at 55 C for two hr. The probe, diluted in 15 ml freshhybridization solution, is then added to the membrane. The membrane ishybridized with the probe at 55 C for 16 hr. Following hybridization,the membrane is washed for 15 min at 25 C in 1 mM Tris (pH 80), 1%Sarkosyl, and four times for 15 min each at 25 C in 1 mM Tris (pH 8.0).To detect hybridization complexes, XOMAT-AR film (Eastman Kodak,Rochester N.Y.) is exposed to the membrane overnight at −70 C,developed, and examined visually.

Polymer Coated Slide-Based Hybridization

Probe is heated to 65 C for five min, centrifuged five min at 9400 rpmin a 5415C microcentrifuge (Eppendorf Scientific, Westbury N.Y.), andthen 18 μl is aliquoted onto the array surface and covered with acoverslip. The arrays are transferred to a waterproof chamber having acavity just slightly larger than a microscope slide. The chamber is keptat 100% humidity internally by the addition of 140 μl of 5×SSC in acorner of the chamber. The chamber containing the arrays is incubatedfor about 6.5 hr at 60 C. The arrays are washed for 10 min at 45 C in1×SSC, 0.1% SDS, and three times for 10 min each at 45 C in 0.1×SSC, anddried.

Hybridization reactions are performed in absolute or differentialhybridization formats. In the absolute hybridization format, probe fromone sample is hybridized to array elements, and signals are detectedafter hybridization complexes form. Signal strength correlates withprobe mRNA levels in the sample. In the differential hybridizationformat, differential expression of a set of genes in two biologicalsamples is analyzed. Probes from the two samples are prepared andlabeled with different labeling moieties. A mixture of the two labeledprobes is hybridized to the array elements, and signals are examinedunder conditions in which the emissions from the two different labelsare individually detectable. Elements on the array that are hybridizedto equal numbers of probes derived from both biological samples give adistinct combined fluorescence (Shalon WO95/35505).

Hybridization complexes are detected with a microscope equipped with anInnova 70 mixed gas 10 W laser (Coherent, Santa Clara Calif.) capable ofgenerating spectral lines at 488 nm for excitation of Cy3 and at 632 nmfor excitation of Cy5. The excitation laser light is focused on thearray using a 20× microscope objective (Nikon, Melville N.Y.). The slidecontaining the array is placed on a computer-controlled X-Y stage on themicroscope and raster-scanned past the objective with a resolution of 20micrometers. In the differential hybridization format, the twofluorophores are sequentially excited by the laser. Emitted light issplit, based on wavelength, into two photomultiplier tube detectors (PMTR1477, Hamamatsu Photonics Systems, Bridgewater N.J.) corresponding tothe two fluorophores. Filters positioned between the array and thephotomultiplier tubes are used to separate the signals. The emissionmaxima of the fluorophores used are 565 nm for Cy3 and 650 nm for Cy5.The sensitivity of the scans is calibrated using the signal intensitygenerated by the yeast control mRNAs added to the probe mix. A specificlocation on the array contains a complementary DNA sequence, allowingthe intensity of the signal at that location to be correlated with aweight ratio of hybridizing species of 1:100,000.

The output of the photomultiplier tube is digitized using a 12-bitRTI-835H analog-to-digital (A/D) conversion board (Analog Devices,Norwood Mass.) installed in an IBM-compatible PC computer. The digitizeddata are displayed as an image where the signal intensity is mappedusing a linear 20-color transformation to a pseudocolor scale rangingfrom blue (low signal) to red (high signal). The data is also analyzedquantitatively. Where two different fluorophores are excited andmeasured simultaneously, the data are first corrected for opticalcrosstalk (due to overlapping emission spectra) between the fluorophoresusing the emission spectrum for each fluorophore. A grid is superimposedover the fluorescence signal image such that the signal from each spotis centered in each element of the grid. The fluorescence signal withineach element is then integrated to obtain a numerical valuecorresponding to the average intensity of the signal. The software usedfor signal analysis is the GEMTOOLS program (Incyte Genomics).

VIII Transcript Imaging

A transcript image was performed using the LIFESEQ GOLD database (June 1release, Incyte Genomics). This process allowed assessment of therelative abundance of the expressed polynucleotides in all of the cDNAlibraries and reconfirmed the data submitted in U.S. Ser. No.08/966,316, filed 7 Nov. 1997. Criteria for transcript imaging can beselected from category, number of cDNAs per library, librarydescription, disease indication, clinical relevance of sample, and thelike.

All sequences and cDNA libraries in the LIFESEQ database have beencategorized by system, organ/tissue and cell type. For each category,the number of libraries in which the sequence was expressed were countedand shown over the total number of libraries in that category. In sometranscript images, all normalized or pooled libraries, which have highcopy number sequences removed prior to processing, and all mixed orpooled tissues, which are considered non-specific in that they containmore than one tissue type or more than one subject's tissue, can beexcluded from the analysis. Treated and untreated cell lines and/orfetal tissue data can also be disregarded or removed where clinicalrelevance is emphasized. Conversely, fetal tissue may be emphasizedwherever elucidation of inherited disorders or differentiation ofparticular cells or organs from stem cells (such as nerves, heart orkidney) would be furthered by removing clinical samples from theanalysis. Transcript imaging can also be used to support data from othermethodologies such as microarray analysis.

The transcript images for SEQ ID NOs:1-15 and 17 are shown below. Thefirst column shows library name; the second column, the number of cDNAssequenced in that library; the third column, the description of thelibrary; the fourth column, absolute abundance of the transcript in thelibrary; and the fifth column, percentage abundance of the transcript inthe library.

SEQ ID NO: 1 Category: Nervous System (Brain) Library cDNAs Descriptionof Tissue Abundance % Abundance HNT2AGt01 5225 teratoCA line, hNT2,t/RA + MI 1 0.0191 BRADFDIT02 5908 frontal lobe, Huntington's, 57M 10.0169 BRAINOM01 24452 brain, infant, 10wF, NORM, WM 1 0.0041

In clinically-relevant brain samples, SEQ ID NO:1 is expressed four-foldhigher in Huntington's disease with its associated dementia than innormal brain. Even though this GPCR is very sparsely expressed in humantissues; when SEQ ID NO:1 is used in a brain tissue-specific assay, itis diagnostic for Huntington's disease.

SEQ ID NO: 2 Category: Digestive System (Stomach) Library cDNAsDescription of Tissue Abundance % Abundance STOMTUT01 2696 stomachadenoCA, 52M, m/STOMMOTO2 3 0.1113 STOMTDE01 3971 stomach, aw/esophagusadenoCA, 61M 2 0.0504 STOMNOTO2 3156 stomach, mw/adenoCA, 52M 1 0.0317*Libraries made from normalized and pooled tissues were removed fromthis analysis

SEQ ID NO:2 was greater than two-fold differentially expressed inbiopsied sample from the stomach of a subject diagnosed withadenocarcinoma over cytologically normal tissue from the same subject.Expression was not found in any other cytologically normal stomachtissue which included STOMNOT01, STOMNOT08, and STOMTMR02. SEQ ID NO:2,when used in a stomach-specific assay, is diagnostic for adenocarcinoma.

SEQ ID NO: 3 Category: Exocrine Glands (Breast) Library cDNAsDescription of Tissue Abundance % Abundance BRSTTUT16 3724 ductalcarcinoma, 43F, m/BRSTTMT01 2 0.0537 BRSTNOR01 3107 breast,mw/BRSTTUT22, lobular CA, 59F 1 0.0322 BRSTTMT02 3240 PF changes,mw/BRSTTUT16, 46F 1 0.0309 BRSTNOTO9 3920 PF changes, mw/BRSTTUT08adenoCA, 45F 1 0.0255 *Libraries made from normalized and pooled tissueswere removed from this analysis

SEQ ID NO:3 is differentially expressed in ductal carcinoma of thebreast as compared with its matched cytologically normal BRSTTMTOI. Inaddition, SEQ ID NO:3 was not expressed in BRSTNOT25 and BRSTNOT35,normal breast tissues removed during breast reduction surgeries, and wasnot as highly expressed in tissues diagnosed with any other diseasestates or their cytologically normal matched tissues. SEQ ID NO:3, whenused in a breast-specific assay including, but not limited to, ductallavage, is diagnostic for ductal carcinoma.

SEQ ID NO: 4 Category: Female Reproductive (Uterus) Library cDNAsDescription of Tissue Abundance % Abundance UTRSTUC01 1175 uterusadenosquamousCA, F, pool 2 0.1702 UTRENOT09 2791 uterus, endometriun,aw/cystocele, 38F 1 0.0358 UTRSNOT05 6678 uterus, mw/leiomyoma, 45 F 10.0150 UTRSTUP05 16785 uterus serous papillary CA, F, pool 2 0.0119UTRSTUP02 22349 uterus endometrial adenoCA, F, pool 2 0.0089

SEQ ID NO:4 is more than five-fold differentially expressed inadenosquamous carcinoma of the uterus. It was not differentiallyexpressed in tissues from subjects diagnosed with cervicitis (UTRCNOP01,UTRCDIE01), endometriosis (UTREDIT07, UTREDIT14), cervical tumor(UTRCTUP01), endometrial adenocarcinoma (UTRSTUP03, UTRSTUP04,UTRSTUP07), or leiomyoma (UTRSTUE01, UTRSTUT04, UTRSTUT05, UTRSTUT07) orin cytologically normal tissues (UTRCNOP01, UTREDME05, UTREDME06,UTREDMF01, UTREDMF02, UTREDMT07, UTRENON03, UTRENOT10, UTRETMC01,UTRETUP01, UTRMTMR02, UTRMTMT01, UTRPNOM01, UTRSNON03, UTRSNOP01,UTRSNOR01, UTRSNOT01, UTRSNOT02, UTRSNOT06, UTRSNOT08, UTRSNOT10,UTRSNOT11; UTRSNOTI2, UTRSNOT16, UTRSNOT18, UTRSTDT01, UTRSTMC01,UTRSTME01, UTRSTMR01, and UTRSTMR02). SEQ ID NO:4, when used in auterus-specific assay, is diagnostic for adenosquamous carcinoma.

SEQ ID NO: 5 Category: Exocrine Glands (Breast) Library cDNAsDescription of Tissue Abundance % Abundance BRSTTUT13 7631 breastadenoCA, 46F, m/BRSTNOT33 58 0.7601 BRSTNOT31 3102 breast, mw/ductaladenoCA, 57F 11 0.3546 BRSTNOT32 3766 nonfibrocyctic breast disease, 46F13 0.345 •Libraries made from normalized or pooled tissues and thosecontaining less than 3000 cDNAs were removed from this analysis.

SEQ ID NO:5 is differentially expressed more than two-fold inadenocarcinoma of the breast when compared to expression incytologically normal BRSTNOT31, BRSTNOT32 and matched BRSTNOT33. SEQ IDNO:5 was not differentially expressed in BRSTNOT25 and BRSTNOT35, normalbreast tissues removed during breast reduction surgeries, and was not ashighly expressed in tissues diagnosed with any other disease states ortheir cytologically normal matched tissues. SEQ ID when used in abreast-specific assay including, but not limited to, ductal lavage, isdiagnostic for adenocarcinoma.

SEQ ID NO: 6 Category: Female Reproductive (Ovary) Library cDNAsDescription of Tissue Abundance % Abundance OVARTUT02 3532 ovary tumor,mucinous cystademona, 51F 2 0.0566 OVARTUT07 3663 ovary, mw/follicularcysts, 28F 1 0.0273 OVARTUT13 3868 ovary, aw/leiomyoma, 47F 1 0.0259OVARTUT07 4386 ovary tumor, adenoCA, 58F 1 0.0228 OVARNOT02 8870 ovary,aw/cardiomyopathy, 59F 1 0.0113 ‘Libraries made from normalized orpooled tissues were removed from this analysis.

SEQ ID NO:6 is differentially expressed more than two-fold in mucinouscystadenoma of the ovary when compared to expression in cytologicallynormal OVARNOT07, OVARNOT13, and OVARNOTO2 and in ovary tissue from asubject diagnosed with adenocarcinoma. SEQ ID NO:6 when used in aovary-specific assay, is diagnostic for mucinous cystadenoma.

SEQ ID NO: 7 Category: Musculoskeletal System (Cartilage, Synovium)Library cDNAs Description of Tissue Abundance % Abundance CARCTXT02 3594knee chondrocytes, M/F, t/IL-1 4 0.01113 SYNOOAT01 5674 synovium, knee,OA, 82F 5 0.0881 SYNONOT01 4046 synovium, 75M* 3 0.0741 SYNORAT03 5785synovium, writs, rheuA, 56F 4 0.0691 SYNORAT05 3466 synovium, knee,rheuA, 62F 2 0.0577 SYNORAT04 5636 synovium, wrist, rheuA, 62F 3 0.0532CARGDIT02 3440 cartilage, OA, M/F 1 0.0291 CARGDIT01 7229 cartilage, OA2 0.0277 SYNORAB01 5053 synovium, hip, rheuA, 68F 1 0.0198 *insufficientclinical data to rule out that this individual did not have someage-related arthritis.

SEQ ID NO:7 is preferentially expressed in IL-1 treated chrondrocytescultured from knee cartilage, in cartilage and synovia from subjectswith rheumatoid and osteoarthritis. It was not expressed in normalcontrol CARGNOT01. SEQ ID NO:7, when used in a tissue-specific assay, isdiagnostic for arthritis.

SEQ ID NO: 8 Category: Male Reproductive (Testes) Description of LibrarycDNAs Tissue Abundance % Abundance TESTTUT03 3812 testicular 2 0.0525seminoma, 45M

SEQ ID NO:8 was significantly expressed in testicular seminoma; it wasnot expressed in normal tissue from TESTNOC01, TESTNOF01, TESTNOM01,TESTNON04, TESTNOP01, TESTNOT01, TESTNOT03, TESTNOT04, TESTNOT07,TESTNOT10, and TESTNOT11, or in embryonal carcinomas from TESTTUE02 andTESTTUT02. SEQ ID NO:7, when used in a clinically relevant,testicle-specific assay, is a diagnostic for testicular seminoma.

SEQ ID NO: 9 Category: Male Reproductive (Prostate) Library cDNAsDescription of Tissue Abundance % Abundance PROSTMT05 3234 AH,mw/PROSTUT16 adenoCA, 55M 2 0.0618 PROSNOT19 3678 AH, mw/PROSTUT13adenoCA, M 2 0.0544 PROSNOT07 3046 Ah, mw/PROSTUT05 adenoCA, 69M 10.0328 PROSTMT07 3104 AH, mw/adenoCA 73M 1 0.0322 PROSDIN01 3421 AH,mw/PROSTUT10 adenoCA, 66, NORM 1 0.0292 PROSNOT28 3814 AH, mw/PROSTUT16adenoCA, 55M 1 0.0262 PROSNOT15 4133 AH, mw/PROSTUT10 adenoCA, 66M 10.0242 PROSTMY01 6460 AH, mwPROSTUT16 adenoCA, 55M 1 0.0155 PROSBPT026583 AH, mw/adenoCA, 65M 1 0.0152 *Libraries made from subtracted orpooled tissues were removed from this analysis.

SEQ ID NO:9 was specifically expressed in prostate tissue cytologicallyshowing adenofibromatous hyperplasia and matched with adenocarcinoma ofthe prostate (see PROSTUT matches above). It was not expressed intissues from subjects diagnosed with benign prostatic hyperplasia(PROSBPS05, PROSBPT03, PROSDIP01, PROSDIP02, and PROSDIP03), orprostatic IN (PROETMP06, PROETMP07). SEQ ID NO:9, when used in aprostate-specific assay, is diagnostic for AH and may serve as an earlydiagnostic marker for prostatic adenocarcinoma.

SEQ ID NO: 10 Category: Urinary Tract (Bladder) Description of % LibrarycDNAs Tissue Abundance Abundance BLADNOT05 3774 bladder mw/ 4 0.1060BLADTUT04 TC CA in situ, 60M BLADDIT01 3775 bladder, chronic 1 0.0265cystitis, 73M *Libraries made from normalized tissues were removed fromthis analysis.

SEQ ID NO:10 showed five-fold differential expression in a cytologicallynormal bladder library which was matched with transitional cellcarcinoma of the bladder. Expression of SEQ ID NO:10 was clearlydistinct from that seen in tissue affected by chronic cystitis and wasnot seen in normal tissues, BLADNOR01, BLADNOT01, BLADNOT03, BLADNOT04,BLADNOT06, and BLADNOT08 or in the tumor libraries, BLADTUE01,BLADTUT02, BLADTUT03, BLADTUT04, BLADTUT05, BLADTUT06, BLADTUT07 andBLADTUT08, SEQ ID NO:10, when used in a bladder-specific assay, servesas an early diagnostic marker for transitional cell carcinoma of thebladder.

SEQ ID NO: 11 Category: Urinary Tract (Kidney) Description of % LibrarycDNAs Tissue Abundance Abundance KIDNTUT13 3771 renal cell CA, 2 0.053051F KIDNTUT15 3941 renal cell CA, 2 0.0507 65M m/ KIDNNOT19 KIDNNOT196952 mw/KIDNTUT15 2 0.0288 renal cell CA, 65M KIDNTUT14 3861 renal cellCA, 1 0.0259 43M, m/ KIDNNOT20 *Libraries made from normalized,subtracted, and pooled tissues were removed from this analysis.

SEQ ID NO:11 is expressed in renal cell cancers and not expressed incytologically normal kidney libraries (KIDNNOT01, KIDNNOT02, KIDNNOT20,KIDNNOT25, KIDNNOT26, KIDNNOT31, KIDNNOT32) or in KIDPTDE01 from asubject diagnosed with interstitial nephritis. SEQ ID NO:10, when usedin a kidney-specific assay, serves as a diagnostic for renal cellcancer.

SEQ ID NO: 12 Category: Exocrine Glands (Breast) Description of %Library cDNAs Tissue Abundance Abundance BRSTTUT15 6535 adenocaracinoma,2 0.0306 46F, m/BRSTNOT17

SEQ ID NO:12 is expressed in adenocarcinoma of the breast and notexpressed in cytologically normal matched tissue. SEQ ID NO:12, whenused in a breast-specific assay including, but not limited to, ductallavage, serves as a diagnostic for adenocarcinoma of the breast.

SEQ ID NO: 13 Category: Endocrine Glands (Pituitary Gland) Descriptionof % Library cDNAs Tissue Abundance Abundance PITUNOT06 6165 Pituitaryaw/ 808 13.1062 schizophrenia, COPD, 55M PITUNOT02 226 Pituitary, 41.7699 15-75M/F, pool PITUNOT01 8390 Pituitary, 87 1.0369 16-70M/F, poolPITUNOT03 2857 Pituitary aw/colon 15 0.5250 cancer, 46M PITUDIR01 5981Pituitary aw/AD, 14 0.2341 mets adenoCA, 70F *Libraries made fromnormalized tissues were removed from this analysis,

SEQ ID NO:13 is highly overexpressed in the pituitary gland removed froma schizophenic subject with chronic pulmonary pulmonary disease. Suchhigh expression levels were not seen in pooled normal tissue or in thepituitaries of subjects with cancers and Alzheimer's disease (AD). SEQID NO:13, when used in a tissue-specific assay, serves as a diagnosticfor schizophrenia.

SEQ ID NO: 14 Category: Exocrine Glands (Breast) Library cDNAsDescription of Tissue Abundance % Abundance BRSTTUT22 3774 LobularCA/BRSTNOT16 2 0.0530 BRSTNOT31 3102 mw/ductal adenoCA, 57F 1 0.0322BRSTDIT01 3394 PF changes, mw/intraductal cancer, 48F 1 0.0295 BRSTNOT283734 PF changes, 40F 1 0.0268 BRSTNOT09 3920 PF changes, mw/BRSTTUT08adenoCA, 45F 1 0.0255 BRSTNOT19 4019 mw/lobular CA, 67F 1 0.0249BRSTNOT23 4056 NF breast disease, 35F 1 0.0247 BRSTNOT03 6777 PFchanges, mw/BRSTTUT02 adenoCA, 54F 1 0.0148 BRSTNOT02 9077 PF changes,mw/BRSTTUT01 adenoCA, 55F 1 0.0110 BRSTNOT07 10055 PF changes,mw/intraductal adenoCA, 43F 1 0.0099 *Libraries made from normalizedtissues were removed from this analysis.

SEQ ID NO:14 is differentially expressed in breast cancer, inparticular, in lobular carcinoma. When used in a breast-specific assayincluding, but not limited to, ductal lavage, SEQ ID NO:14 serves as adiagnostic for breast cancer.

SEQ ID NO: 15 Category: Hemic Immune (Peripheral blood) Description of %Library cDNAs Tissue Abundance Abundance EOSINOT02 2356 eosinophils, 50.2122 asthma, M/F MPHGNOT03 7791 macrophages, 4 0.0513 M/F EQSINOT012404 eosinophils, 0.0416 nonallergic, M/F1 *Libraries made from treatedcell lines were removed from this analysis.

SEQ ID NO:15 is 4-fold differentially expressed in peripheral blood,particularly eosinophils of asthmatics. When used in an assay of a lungsample, SEQ ID NO:15 is a diagnostic for asthma.

SEQ ID NO: 17 Category: Exocrine Gland (Breast) Description of %Library* cDNAs Tissue Abundance Abundance BRSTTUT14 3951 breast 1 0.0253adenoCa, 62F, m/BRSTNOT14

The transcript image confirms the information obtained in the originalnorthern analysis (7 Nov. 1997). SEQ ID NO:17 is expressed inadenocarcinoma of the breast and not expressed in cytologically normalmatched tissue, BRSTNOT14. Expression was absent from BRSTNOT25 andBRSTNOT35, normal breast tissues removed during breast reductionsurgeries. When used in a breastspecific assay, including, but notlimited to, ductal lavage, and compared with cancerous and normalstandards, expression of SEQ ID NO:17 is diagnostic for breastadenocarcinoma.

In assays using normal and cancerous standards and patient samples, thecDNA, an mRNA, or an antibody specifically binding the protein can servea clinically relevant diagnostic marker for disorders associated withcell proliferation and cell signaling.

IX Northern Analyses

SEQ ID NOs:1-15 and 17 were compared with all the other sequences in theLIFESEQ database (Incyte Genomics, Palo Alto Calif.) using BLASTanalysis (Altschul (1993) supra); Altschul(1990) Supra). The results ofthe BLAST analyses were reported in THE INVENTION section above.

Each of the Incyte clones is also used to screen northern blots. A probeis generated by EcoR1 digestion of the plasmid containing the cDNA. Therestriction digest is fractionated on a 1% agarose gel, a restrictionfragment from about 400 to about 1400 nt in length is excised from thegel and purified on a QIAQUICK column (Qiagen). The fragment iscomprised of the 5′ most region of the insert. The probe is prepared byrandom priming using the REDIPRIME labeling kit (APB) with REDIVUE[⁻³²P]d-CTP (3000 Ci/mmol; APB). Unincorporated radioactivity is removedby column chromatography using a SEPHADEX G-50 NICK column (APB).

Each commercial MTN blot (Clontech) contained approximately 2 ug of polyA+ per lane from various tissues. Otherwise, RNA was electrophoresed ona denaturing formaldehyde, 1.2% agarose gel, blotted on a nylonmembrane, and fixed by UV irradiation.

Blots are pre-hybridized in RAPID-HYB hybridization buffer (APB) for 1hour at 65 C. Hybridizations are performed at 65 C using 0.5×10 cpm/mlprobe for 1 hour. Blots are washed for 2×10 minutes in 1×SSC, 0.1% SDSat room temperature followed by 2 stringent washes at 65 C in 0.2×SSC,0.1% SDS for 10 minutes each. Blots are wrapped in SARAN WRAP plasticfilm (Dow Chemical, Midland Mich.) and autoradiographed at −70 C using 2intensifying screens and HYPERFILM-MP (APB).

The northern analysis for SEQ ID NO:17, Incyte clone 2547002, performedWednesday, 5 Nov. 1997 showed expression in the following libraries ofthe LIFESEQ database (Incyte Genomics).

Library Description HEARNOT06 heart, 44M HEAPNOT01 heart, coronaryartery, plaque, pool SMCANOT01 smooth muscle cell line, aorta, MBRSTTUT14 breast tumor, adenocarcinoma, 62F, mw/BRSTNOT14 UTRSNOT16uterus, endometrium, 48F UTRSNOT11 uterus, myometrium, 43F UTRSNOT02uterus, 34F LPARNOT02 parotid gland, 70M

When used in a breast sample specific assay and compared with cancerousand normal standards, SEQ ID NO:17 is diagnostic for breastadenocarcinoma (bold above)

X Complementary Molecules

Molecules complementary to the cDNA, from about 5 (PNA) to about 5000 bp(complement of a cDNA insert), are used to detect or inhibit geneexpression. Detection is described in Example VII. To inhibittranscription by preventing promoter binding, the complementary moleculeis designed to bind to the most unique 5′ sequence and includesnucleotides of the 5′ UTR upstream of the initiation codon of the openreading frame. Complementary molecules include genomic sequences (suchas enhancers or introns) and are used in “triple helix” base pairing tocompromise the ability of the double helix to open sufficiently for thebinding of polymerases, transcription factors, or regulatory molecules.To inhibit translation, a complementary molecule is designed to preventribosomal binding to the mRNA encoding the protein.

Complementary molecules are placed in expression vectors and used totransform a cell line to test efficacy; into an organ, tumor, synovialcavity, or the vascular system for transient or short term therapy; orinto a stem cell, zygote, or other reproducing lineage for long term orstable gene therapy. Transient expression lasts for a month or more witha non-replicating vector and for three months or more if elements forinducing vector replication are used in the transformation/expressionsystem.

Stable transformation of dividing cells with a vector encoding thecomplementary molecule produces a transgenic cell line, tissue, ororganism (U.S. Pat. No. 4,736,866). Those cells that assimilate andreplicate sufficient quantities of the vector to allow stableintegration also produce enough complementary molecules to compromise orentirely eliminate activity of the cDNA encoding the protein.

XI Protein Expression

Expression and purification of the protein are achieved using either amammalian or an insect cell expression system. The pUB61V5-His vectorsystem (Invitrogen, Carlsbad Calif.) is used to express signalpeptide-containing proteins in CHO cells. The vector contains theselectable bsd gene, multiple cloning sites, the promoter/enhancersequence from the human ubiquitin C gene, a C-terminal V5 epitope forantibody detection with anti-V5 antibodies, and a C-terminalpolyhistidine (6×His) sequence for rapid purification on PROBOND resin(Invitrogen). Transformed cells are selected on media containingblasticidin.

Spodoptera frueiperda (Sf9) insect cells are infected with recombinantAutographica californica nuclear polyhedrosis virus (baculovirus). Thepolyhedrin gene is replaced with the cDNA by homologous recombinationand the polyhedrin promoter drives cDNA transcription. The protein issynthesized as a fusion protein with 6× his which enables purificationas described above. Purified protein is used in the following activityand to make antibodies.

XII Production of Antibodies

A signal peptide-containing protein is purified using polyacrylamide gelelectrophoresis and used to immunize mice or rabbits. Antibodies areproduced using the protocols well known in the art and summarized below.Alternatively, the amino acid sequence of signal peptide-containingproteins is analyzed using LASERGENE software (DNASTAR) to determineregions of high antigenicity. An antigenic epitope, usually found nearthe C-terminus or in a hydrophilic region is selected, synthesized, andused to raise antibodies. Typically, epitopes of about 15 residues inlength are produced using an 431A peptide synthesizer (ABI) usingFmoc-chemistry and coupled to KLH (Sigma-Aldrich) by reaction withN-maleimidobenzoyl-N-hydroxysuccinimide ester to increase antigenicity.

Rabbits are immunized with the epitope-KLH complex in complete Freund'sadjuvant. Immunizations are repeated at intervals thereafter inincomplete Freund's adjuvant. After a minimum of seven weeks for mouseor twelve weeks for rabbit, antisera are drawn and tested forantipeptide activity.

Testing involves binding the peptide to plastic, blocking with 1% bovineserum albumin, reacting with rabbit antisera, washing, and reacting withradio-iodinated goat anti-rabbit IgG. Methods well known in the art areused to determine antibody titer and the amount of complex formation.

XIII Purification of Naturally Occurring Protein Using SpecificAntibodies

Naturally occurring or recombinant protein is purified by immunoaffinitychromatography using antibodies which specifically bind the protein. Animmunoaffinity column is constructed by covalently coupling the antibodyto CNBr-activated SEPHAROSE resin (APB). Media containing the protein ispassed over the immunoaffinity column, and the column is washed usinghigh ionic strength buffers in the presence of detergent to allowpreferential absorbance of the protein. After coupling, the protein iseluted from the column using a buffer of pH 2-3 or a high concentrationof urea or thiocyanate ion to disrupt antibody/protein binding, and theprotein is collected.

XIV Screening Molecules for Specific Binding with the cDNA or Protein

The cDNA, or fragments thereof, or the protein, or portions thereof, arelabeled with 3 P-dCTP, Cy3-dCTP, or Cy5-dCTP (APB), or with BIODIPY orFITC (Molecular Probes, Eugene Oreg.), respectively. Libraries ofcandidate molecules or compounds previously arranged on a substrate areincubated in the presence of labeled cDNA or protein. After incubationunder conditions for either a nucleic acid or amino acid sequence, thesubstrate is washed, and any position on the substrate retaining label,which indicates specific binding or complex formation, is assayed, andthe ligand is identified. Data obtained using different concentrationsof the nucleic acid or protein are used to calculate affinity betweenthe labeled nucleic acid or protein and the bound molecule.

XV Two-Hybrid Screen

A yeast two-hybrid system, MATCHMAKER LexA Two-Hybrid system (ClontechLaboratories, Palo Alto Calif.), is used to screen for peptides thatbind the protein of the invention. A cDNA encoding the protein isinserted into the multiple cloning site of a pLexA vector, ligated, andtransformed into E. coli cDNA, prepared from mRNA, is inserted into themultiple cloning site of a pB42AD vector, ligated, and transformed intoE. coli construct a cDNA library. The pLexA plasmid and pB42AD-cDNAlibrary constructs are isolated from E. coli and used in a 2:1 ratio toco-transform competent yeast EGY48[p8oplacZ] cells using a polyethyleneglycol/lithium acetate protocol. Transformed yeast cells are plated onsynthetic dropout (SD) media lacking histidine (His), tryptophan (Trp),and uracil (Ura), and incubated at 30 C until the colonies have grown upand are counted. The colonies are pooled in a minimal volume of 1×TE (pH7.5), replaced on SD/-His/-Leu/-Trp/-Ura media supplemented with 2%galactose (Gal), 1% raffinose (Raf), and 80 mg/ml5-bromo-4-chloro-3-indolyl β-d-galactopyranoside (X-Gal), andsubsequently examined for growth of blue colonies. Interaction betweenexpressed protein and cDNA fusion proteins activates expression of aLEU2 reporter gene in EGY48 and produces colony growth on media lackingleucine (-Leu). Interaction also activates expression of β-galactosidasefrom the p8op-lacZ reporter construct that produces blue color incolonies grown on X-Gal.

Positive interactions between expressed protein and cDNA fusion proteinsare verified by isolating individual positive colonies and growing themin SD/-Trpf-Ura liquid medium for 1 to 2 days at 30 C. A sample of theculture is plated on SD/-Trp/-Ura media and incubated at 30 C untilcolonies appear. The sample is replica-plated on SD/-Trp/-Ura andSD/-His/-Trp/-Ura plates. Colonies that grow on SD containing histidinebut not on media lacking histidine have lost the pLexA plasmid.Histidinerequiring colonies are grown on SD/GallRaf/X-Gal/-Trp/-Ura, andwhite colonies are isolated and propagated. The pB42AD-cDNA plasmid,which contains a cDNA encoding a protein that physically interacts withthe protein, is isolated from the yeast cells and characterized.

XVI Demonstration of Protein Activity

Cell Proliferation

SP can be expressed in a mammalian cell line such as DLD-1 or HCT116(ATCC; Manassas Va.) by transforming the cells with a eukaryoticexpression vector encoding SF. Other eukaryotic expression vectors, suchas those mentioned in EXAMPLE XI above, are commercially available, andthe techniques to introduce them into cells are well known to thoseskilled in the art. The effect of SP on cell morphology can bevisualized by microscopy; the effect on cell growth can be determined bymeasuring cell doubling-time; and the effect on tumorigenicity can beassessed by the ability of transformed cells to grow in a soft agargrowth assay (Groden et al. (1995) Cancer Res. 55:1531-1539).

Receptor SPs such as those encoded by SEQ ID NOs: 17, 15, 12, 6, and Ican be expressed in heterologous expression systems and their biologicalactivity tested utilizing the purinergic receptor system (P_(2u)) aspublished by Erb et al. (1993; Proc Natl Acad Sci 90:10449-53). Becausecultured K₅₆₂ human leukemia cells lack P2U receptors, they can betransfected with expression vectors containing either normal or chimericP_(2u) and loaded with fura-a, fluorescent probe for Ca⁺⁺. Activation ofproperly assembled and functional extracellularSP-transmembrane/intracellular P_(2U) receptors with extracellular UTPor ATP mobilizes intracellular Ca⁺⁺ which reacts with fura-a and ismeasured spectrofluorometrically. Bathing the transfected K562 cells inmicrowells containing appropriate ligands will trigger binding andfluorescent activity identifying effectors of SP. The P_(2u) system isalso useful for identifying antagonists or inhibitors which blockbinding and prevent such fluorescent reactions.

All patents and publications mentioned in the specification areincorporated by reference herein. Various modifications and variationsof the described method and system of the invention will be apparent tothose skilled in the art without departing from the scope and spirit ofthe invention. Although the invention has been described in connectionwith specific preferred embodiments, it should be understood that theinvention as claimed should not be unduly limited to such specificembodiments. Indeed, various modifications of the described modes forcarrying out the invention that are obvious to those skilled in thefield of molecular biology or related fields are intended to be withinthe scope of the following claims.

TABLE 1 SEQUENCES DESIGNATION INCYTE CLONE LIBRARY HOMOLOG GENBANKDESCRIPTOR SEQ ID NO: 1 SP-1 1221102 NEUTGMT01 g1575512 GPR19 gene SEQID NO: 2 SP-2 1457779 COLNFET02 g1842120 ATP diphosphohydrolase SEQ IDNO: 3 SP-3 1682433 PROSNOT15 g1070391 transmembrane protein SEQ ID NO: 4SP-4 1899132 BLADTUT06 g887602 Saccharomyces cerevisiae protein SEQ IDNO: 5 SP-5 1907344 CONNTUT01 g33715 immunoglobulin light chain SEQ IDNO: 6 SP-6 1963651 BRSTNOT04 g1657623 orphan receptor RDC1 SEQ ID NO: 7SP-7 1976095 PANCTUT02 g2117185 Mycobacterium tuberculosis protein SEQID NO: 8 SP-8 2417676 HNT3AZT01 g2150012 human transmembrane protein SEQID NO: 9 SP-9 1805538 SINTN0T13 g294502 extracellular matrix protein SEQID NO: 10 SP-10 1869688 SKINBIT01 g1562 G3 serine/threonine kinase SEQID NO: 11 SP-11 1880692 LEUKNOT03 g1487910 Caenorhabditis elegansprotein SEQ ID NO: 12 SP-12 318060 EOSIHET02 g606788 opioid receptor SEQID NO: 13 SP-13 396450 PITUNOT02 g342279 opiomelanocortin SEQ ID NO: 14SP-14 506333 TMLR3DT02 g2204110 adenylyl cyclase type VII SEQ ID NO: 15SP-15 764465 LUNGNOT04 g1902984 lectin-like oxidized LDL receptor SEQ IDNO: 16 SP-16 2547007 UTRSNOT11 g399711 bovine GPCR SEQ ID NO: 17 2547007UTRSNOT11

What is claimed is:
 1. A purified protein comprising an amino acidsequence selected from: a) a protein having an amino acid sequence ofSEQ ID NO:16, b) a naturally occurring protein having an amino acidsequence at least 90% identical to an amino acid sequence of SEQ IDNO:16, c) a biologically active fragment of a protein having an aminoacid sequence of SEQ ID NO: 16, and d) an epitope of a protein having anamino acid sequence of SEQ ID NO:16.
 2. A purified protein of claim 1,consisting of an amino acid sequence of SEQ ID NO:16.
 3. A compositioncomprising a protein of claim 1, and a labeling moiety or apharmaceutical carrier.
 4. A method for using a protein to screen aplurality of molecules or compounds to identify at least one ligand, themethod comprising: a) combining a protein of claim 1, with the moleculesor compounds under conditions to allow specific binding; and b)detecting specific binding, thereby identifying a ligand whichspecifically binds the protein.
 5. The method of claim 4, wherein themolecules or compounds are selected from DNA molecules, RNA molecules,peptide nucleic acids, peptides, proteins, mimetics, agonists,antagonists, antibodies, immunoglobulins, inhibitors, and drugs.
 6. Anisolated antibody which specifically binds to a protein of claim
 1. 7. Amethod of using a protein to prepare and purify a polyclonal antibodycomprising: a) immunizing a animal with a protein of claim 1, underconditions to elicit an antibody response; b) isolating animalantibodies; c) attaching the protein to a substrate; d) contacting thesubstrate with isolated antibodies under conditions to allow specificbinding to the protein; e) dissociating the antibodies from the protein,thereby obtaining purified polyclonal antibodies.
 8. A method of using aprotein to prepare a monoclonal antibody comprising: a) immunizing aanimal with a protein of claim 1, under conditions to elicit an antibodyresponse; b) isolating antibody producing cells from the animal; c)fusing the antibody producing cells with immortalized cells in cultureto form monoclonal antibody producing hybridoma cells; d) culturing thehybridoma cells; and e) isolating from culture monoclonal antibodieswhich specifically bind the protein.
 9. A polyclonal antibody producedby the method of claim
 7. 10. A monoclonal antibody produced by themethod of claim
 8. 11. A method for using an antibody to detectexpression of a protein in a sample, the method comprising: a) combiningthe antibody of claim 6, with a sample under conditions which allow theformation of antibody:protein complexes; and b) detecting complexformation, wherein complex formation indicates expression of the proteinin the sample.
 12. The method of claim 11, wherein complex formation iscompared with standards and is diagnostic of cancer.
 13. A compositioncomprising an antibody of claim 6, and a therapeutic agent.
 14. A methodfor testing a compound for effectiveness as an agonist, the methodcomprising: a) exposing a sample comprising a protein of claim 1, to acompound, and b) detecting agonist activity in the sample.
 15. Acomposition comprising an agonist identified by the method of claim 13,and a pharmaceutical carrier.
 16. A method for testing a compound foreffectiveness as an antagonist, the method comprising: a) exposing asample comprising a protein of claim 1, to a compound, and b) detectingantagonist activity in the sample.
 17. A composition comprising anantagonist identified by the method of claim 16, and a pharmaceuticalcarrier.
 18. A method for treating a disorder associated with decreasedexpression of a signal peptide-containing protein comprisingadministering to a subject in need of such treatment the composition ofclaim
 15. 19. A method for treating a disorder associated with increasedexpression of a signal peptide-containing protein comprisingadministering to a subject in need of such treatment the composition ofclaim
 17. 20. An array containing the antibody of claim 6.